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Abstract 

We survey and unify recent results on the existence of accurate algorithms for evaluating 
multivariate polynomials, and more generally for accurate numerical linear algebra with struc- 
tured matrices. By "accurate" we mean that the computed answer has relative error less than 
1, i.e., has some correct leading digits. We also address efficiency, by which we mean algorithms 
that run in polynomial time in the size of the input. Our results will depend strongly on the 
model of arithmetic: Most of our results will use the so-called Traditional Model (TM), where 
the computed result of op(a, b), a binary operation like a + b, is given by op(a, b) * (1 + S) where 
all we know is that \5\ < e <C 1. Here e is a constant also known as machine cpsilon. 

We will see a common reason that the following disparate problems all permit accurate and 
efficient algorithms using only the four basic arithmetic operations: finding the eigenvalues of 
a suitably discretized scalar elliptic PDE, finding eigenvalues of arbitrary products, inverses, or 
Schur complements of totally nonnegative matrices (such as Cauchy and Vandermonde), and 
evaluating the Motzkin polynomial. Furthermore, in all these cases the high accuracy is "de- 
served", i.e., the answer is determined much more accurately by the data than the conventional 
condition number would suggest. 

In contrast, we will see that evaluating even the simple polynomial x + y + z accurately is 
impossible in the TM, using only the basic arithmetic operations. We give a set of necessary 
and sufficient conditions to decide whether a high accuracy algorithm exists in the TM, and 
describe progress toward a decision procedure that will take any problem and provide either a 
high accuracy algorithm or a proof that none exists. 

When no accurate algorithm exists in the TM, it is natural to extend the set of available 
accurate operations by a library of additional operations, such as x + y + z, dot products, or 
indeed any enumerable set which could then be used to build further accurate algorithms. We 
show how our accurate algorithms and decision procedure for finding them extend to this case. 

Finally, we address other models of arithmetic, and the relationship between (im)possibility 
in the TM and (in)efficient algorithms operating on numbers represented as bit strings. 
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1 Introduction 



A result of a computation will be called accurate if it has a small relative error, in particular less 
than 1 (i.e., some leading digits must be correct). Now we can ask what the following problems 
have in common: 

1. Accurately evaluate the Motzkin polynomial 

I \ 3,2 2/2,2 o2\ 

p{x,y,z) = z + x y (x +y — 3z ) . 

2. Accurately compute the entries or eigenvalues of a matrix gotten by performing an arbitrary 
sequence of operations chosen from the set {multiplication, J-inversion, Schur complement, 
taking submatrices}, starting from a set of Totally Nonnegative (TN) matrices such as the 
Hilbert matrix, TN generalized Vandermonde matrices, etc. 

3. Accurately find the eigenvalues of a suitably discretized scalar elliptic PDE. 

We also ask how they all differ from the apparently much easier problem of evaluating x + y + z. 

The answer will depend strongly on our model of arithmetic. For most of this paper we will use 
the Traditional Model (TM) of arithmetic, that the computed result of op(a, b), a binary operation 
like a + 6, is given by op(a, b) - (1 + 5) where all we know is that \5\ < e <C 1. Here e is a real constant 
also known as machine precision. We will refer to rnd(op(a,b)) = op(a,b)(l + 5) as the rounded 
result of op(a,b). We will distinguish between the cases where the other quantities (including 5s) 
are all real, or all complex. 

To see why some expressions may or may not be evaluable accurately in the TM, consider 
multiplying or dividing two numbers each known to relative error r\ < 1: then their rounded product 
or quotient is clearly correct with relative error 0(max(r/, e)). This also holds when adding two 
like-signed real numbers (or subtracting real numbers with opposite signs). In contrast, subtracting 
two like-signed real numbers x — y can lead to cancellation of leading digits: If x and y themselves 
have nonzero relative error bounds, then depending on the extent of cancellation, x — y may have 
an arbitrary relative error. On the other hand if x and y are exact inputs, then rnd(x ± y) = 
(x ± y)(l + 5) is also known with small relative error. In other words, an easy sufficient (but not 
necessary!) condition in the TM for an algorithm to be accurate is "No Inaccurate Cancellation": 

NIC: The algorithm only (1) multiplies, (2) divides, (3) adds (resp., subtracts) real numbers with 
like (resp., differing) signs, and otherwise only (4) adds or subtracts input data. 

Sometimes we will also include the square root among our allowed operations in NIC 1 . 

In the TM, with real numbers, the three problem listed above all have novel accurate algorithms 
that use only four basic arithmetic operations (+, — , x and /), comparison and branching, and 
satisfy NIC. Furthermore, the matrix algorithms are efficient, running in 0(n 3 ) time (we say more 
about efficiency below). These linear algebra algorithms depend on some recently discovered matrix 
factorizations and update formulas, and the algorithm for the Motzkin polynomial (surprisingly) 
fills a page with 8 cases. In contrast, with complex arithmetic, no accurate algorithms exist. Nor is 

1 However, square roots require more care in bounding the relative error: In floating point arithmetic on most 

1/2 100 2 100 

computers, computing y = x 1 by 100 square roots and then z = y by 100 squarings, yields z = 1 independently 
of x > 0. 
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Figure 1: Eigenvalues of the 20th Schur Complement of the 40-by-40 Vandermonde matrix Vy = 
i 3 , computed both using a Conventional algorithm (x) and and Accurate algorithm (+). 



there an accurate algorithm using only these operations, in the real or complex case, for evaluating 
x + y + z accurately. 

For example, consider Figure 1, which shows the eigenvalues of a matrix gotten by taking 
the trailing 20-by-20 Schur complement of a 40-by-40 Vandermonde matrix. Both the eigenvalues 
computed by our algorithm (in standard double precision floating point arithmetic), and by a con- 
ventional algorithm are shown. Note that every eigenvalue computed by the conventional algorithm 
is wrong by orders of magnitude, whereas all ours are correct to nearly 14 digits, as confirmed by 
a very high precision calculation. 

Section 2 of this paper will survey a great many other examples of structured matrices where 
accurate and efficient linear algebra algorithms are possible using NIC as the main (but not only) 
tool; see Table 1 for a summary. 

One may wonder whether this accuracy is an "overkill" , because small uncertainties in the data 
might cause much larger uncertainties in the computed results. In this case computing results to 
high accuracy would be more than the data deserves, and not worth any additional cost. Indeed 
the usual condition numbers of the problems considered here are usually enormous. However, 
their structured condition numbers are often quite modest, justifying computing the answers to 
high accuracy. For example, while a Cauchy matrix CV,- = l/(xi + yj) such as the Hilbert matrix 
(xi = i = 1 + y%) is considered badly conditioned since k(C) = ||C|| • HC" 1 )! can be very large, the 
entries of C _1 are actually much less sensitive functions of Xj and yj than k(C) would indicate. 
Indeed, if the answer is given by a formula satisfying NIC, then the condition number can only 
be large when cancellation occurs when computing x ± y for uncertain input data x and y; each 
such expression adds the quantity l/rel_gap(x, y) = (|x| + |y|)/|x±y|to the structured condition 
number. This is true of all the examples in Section 2, justifying their more accurate computation 
than would the usual condition number. 
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The profusion and diversity of these examples naturally raises the question as to what mathe- 
matical property they share that makes these algorithms possible. Section 3 of this paper addresses 
this, by describing progress towards a decision procedure for the more basic problem of deciding 
whether a given multivariate polynomial can be evaluated accurately using the basic rounded arith- 
metic operations, comparison, and branching. The answer will depend not just on the polynomial, 
but whether the data is real or complex, and on the domain of evaluation (a smaller domain may 
be easier than a larger one, if it eliminates difficult arguments). This decision procedure would 
yield simpler necessary and sufficient conditions (not identical in all cases) that tell us whether the 
algorithms in Section 2 (or others not yet discovered) must exist (we will use the fact that accurate 
determinants are necessary and often sufficient for accurate linear algebra) . It will turn out that the 
results for real arithmetic are much more complicated than for complex arithmetic, where simple 
necessary and sufficient condition may be stated (the answer is basically given by NIC above); this 
reflects the difference between algebraic geometry over the real and complex numbers. 

One negative result of Section 3.3 will be the impossibility of evaluating x + y + z using only 
the basic rounded arithmetic operations. This seems odd, since x + y + z is so simple. But it is 
only simple if we use the fact that in practice (floating point arithmetic), x, y and z are represented 
by finite bit strings that can be manipulated and analyzed differently than by assuming only that 
rnd(op(a, b)) = op(a, b)(l + 5) with |<5| < e. To go further we must extend our model of arithmetic. 
We do so in two ways. 

Section 3.4 continues by adding so-called "black-box" operations to the basic arithmetic opera- 
tions. For example, one could assume that a subroutine for the accurate evaluation of x + y + z (or 
of dot products, or of 3-by-3 determinants, etc.) also existed, and then ask the analogous question 
as to what other polynomials could be accurately evaluated, using this subroutine as a building 
block. This indeed models computational practice, where subroutine libraries of such black-box 
routines are provided in order to build accurate algorithms for other more complicated polynomi- 
als. In Section 3.4 we also describe how to extend our decision procedures when an arbitrary set of 
such black-box routines is available, and the question is whether another polynomial not already 
in the set can be evaluated accurately. A positive result will be showing that just the ability to 
compute 2-by-2 determinants accurately is enough to permit accurate and efficient linear algebra 
on the inverses of tridiagonal matrices. A negative result will be the impossibility of accurate linear 
algebra with Toeplitz matrices, given any set of block-box operations of bounded degree or with a 
bounded number of arguments. 

Sections 3.3 and 3.4 go some way to describing the possibilities and limits of solving numerical 
problems accurately in practice. But "in practice" means using finite representations with bits, i.e., 
floating point, in which case accurate (even exact) polynomial evaluation is always possible, and 
the only question is cost. In Section 4, after a brief discussion of other arithmetic models, we will 
settle on one model we believe best captures the spirit of actual floating point computation, but 
without limiting it to fixed word sizes: an arbitrary pair of integers (to, e) is used to represent the 
floating point number to • 2 6 . In this model, we describe how the algorithms in Section 2 lead to 
efficient algorithms that run in time polynomial in the size of the inputs, the usual computer science 
notion of efficiency. In contrast, conventional algorithms simply run in high enough precision to 
get an accurate answer do not run in polynomial time. 

Finally, in Section 5 we consider the structured condition numbers for the problems we consider, 
which can be much smaller that the usual unstructured condition numbers and so justify accuracy 
computation. In prior work [10], the first author observed that for many problems the condition 
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number of the condition number was approximately equal to the condition number of the original 
problem, and that this corresponded to the geometric property that the condition number was the 
reciprocal of the distance to the nearest ill-posed (or singular) problem. These observations apply 
here, with the following interesting consequence: for the examples considered here it is possible to 
compute the solution to a problem accurately if and only if it is possible to estimate its condition 
number accurately. An analogous phenomenon was already observed in [12]. 

2 Accurate and efficient algorithms for linear algebra 
2.1 Introduction 

The numerical linear algebra problems we will consider include computing the product of matrices, 
the Schur complement, the determinant or other minor, the inverse, the solution to a linear system 
or least squares problem, and various matrix decompositions such as LDU (with or without pivoting) 
QR, SVD (singular value decomposition), and EVD (eigenvalue decomposition). 

Conventional algorithms for these problems are at best only backward stable: When applied to 
a matrix A they compute the exact solution of a nearby problem A + 6 A, where \\5A\\ = 0(e)\\A\\, 
where || • || is some matrix norm and e is machine epsilon. In consequence, the error in the computed 
solution depends on how sensitive the answer is to small changes in A, and is typically bounded in 
norm by = 0(e)k(A), where k(A) is a condition number (a scaled norm of the Jacobian 

of the solution map). Thus we have two ways to lose high relative accuracy: First, bounding the 
error only in norm may provide very weak bounds for tiny solution components; for example the 
error bound for the computed singular values guarantees an absolute error Icr^ true ~~ °i,comp| = 
0(e) maxj a i true' so that the large singular values have small relative errors, but not the small 
ones. Second, when k(A) is large, even large solution components may be inaccurate, as when 
inverting an ill-conditioned matrix. 

However, these conventional algorithms ignore the structure of the matrix, which is critical 
to our approach. Rather than treating, say, a Cauchy matrix C as a collection of n 2 independent 
entries Cj,- = l/(xi+yj), we treat it as a function of its 2n parameters xi and yj. Starting from these 
In parameters, we can find accurate expressions (because they satisfy NIC) for C"s determinant 
det(C) = ni<j( x « ~~ x j)(lH ~ Vj)/ Yli j( x i + Vj) an d other linear algebra problems. As mentioned 
in Section 1, expressions satisfying NIC also imply that their structured condition numbers can be 
arbitrarily smaller than their conventional condition numbers. 

Now we outline our general approach to these problems. First we consider the problems whose 
solutions are rational functions of the parameters, such as computing a determinant or minor. 
Indeed, all these solutions can be expressed using minors or quotients of minors. For example, 
the entries of the inverse or LDU factorization are (quotients of) minors, the product AB can be 
extracted from 

' I A 
I B 
1 

and the last column of 



I 


A -b 


A T 








1 
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contains the solution of the overdetermined least squares problem min x \\Ax — b\\2- Thus the ability 
to compute certain minors with high relative accuracy is sufficient to solve these linear algebra 
problems with high relative accuracy. Conversely, knowing a factorization like LDU with high 
relative accuracy yields the determinant with similar accuracy (via the product ±Y\ i Dn). Thus 
we see that matrix structures that permit accurate computations of certain determinants are both 
necessary and sufficient for solution of these linear algebra problems with high relative accuracy. In 
this section we will identify a number of matrix structures that permit such accurate determinants 
to be calculated. 

Second, we consider the EVD and SVD, which involve more general algebraic functions of the 
matrix entries. To compute these accurately, we need other tools, which we will summarize below 
in Section 2.2. Briefly, our approach will be to compute one of several other matrix decompositions 
using only rational operations (and possibly square roots), and then apply iterative schemes to 
these decompositions that have accuracy guarantees. 

Efficient conventional algorithms (i.e., using 0(n?) arithmetic operations) exist for each of the 
above problems and are available in free packages (e.g., LAPACK [3]) or embedded in commercial 
ones (e.g., MATLAB [56]). So an extra challenge is to find not just accurate algorithms, but ones 
that also take C(n 3 ) operations. 

Our results, using only NIC, are summarized in Table 1, which describes (in a O(-) sense) 
the speed of the fastest known accurate algorithm for each problem shown. There is one column 
for each linear algebra problem considered, and one row for each structured matrix class. The 
abbreviations not yet defined will be explained as we continue. 

The rest of this section is organized as follows. Subsection 2.2 briefly presents accurate algo- 
rithms for the EVD and SVD. Subsection 2.3 walks through Table 1 row by row, again briefly 
explaining the results. Finally, Subsection 2.4 explains how much more is possible if we expand the 
class of formulas we may use beyond NIC in a certain disciplined way. This naturally raises the 
question of whether or not there is a systematic method to recognize such formulas, which is the 
final topic of this paper. 

2.2 Tools for computing EVD and SVD accurately 
2.2.1 Rank revealing decompositions and SVD 

The first accurate SVD algorithm depends on a Rank Revealing Decomposition (RRD) [15] of matrix 
A, a factorization A = XDY where D is nonsingular and diagonal, and X and Y T have full column 
rank and are "well-conditioned". Note that A may be rectangular or singular. The most obvious 
example of an RRD is the SVD, where X and Y are as well-conditioned as possible. Other examples 
where X and Y are (nearly always) well-conditioned come from Gaussian elimination with complete 
pivoting A = LDU, or from QR with complete pivoting A = QDR (more sophisticated pivoting 
techniques with better condition bounds on the unit triangular factors are available [7, 8, 32, 37, 
38, 58, 73]). An RRD A = XDY has two attractive properties 

1. Given the RRD, it is possible to compute the SVD to high relative accuracy in the following 
sense [15, Section 3], [18, Algorithm 2]: 

• The relative error in each singular value o~i is bounded by C(e max(«;(X), n(Y))), where 
k(X) = \\X\\ ■ \\X\l~ 1 . 
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• The relative error in the ith computed (left or right) singular vector is bounded by 
0(emax(fv(X),«;(y r ))/ min^ rel_gap(o"j, aj). In other words, the condition number can 
only be large if the singular value agrees with another one to many leading digits, no 
matter how small they are in absolute value. 

2. These error bounds do not change if the RRD is known only approximately (either because of 
uncertainty in A or roundoff in computing the RRD), as long as [15, Theorem 2.1], [26, 49]: 

• We can compute X where \\X — X\\ = C(e)||X||. 

• We can compute a diagonal D where \Da — Da \ = 0(e)\Da\. 

• We can compute Y where \\Y — Y\\ = 0(e)\\Y\\. 

In other words, we only need the factors X and Y with high absolute accuracy, not relative 
accuracy, a fact that will significantly expand the scope of applicability. 

Among the various algorithms cited above for computing the SVD, we sketch one [15, Algorithm 
3.2], along with an explanation of its accuracy: 

1) Compute the SVD of XD using one-sided Jacobi, yielding XD = tjW T . Thus A = UY,V T Y. 

2) Multiply W = t,(V T Y), respecting parentheses. Thus A = UW. 

3) Compute the SVD of W using one-sided Jacobi, yielding W = UT.V 7 '. Thus A = UUY.V 7 '. 

4) Multiply U = UU, yielding the SVD A = UY,V T . 

Briefly, the reason this works is that in steps 1) and 3), which potentially combine numbers over 
very wide ranges of magnitude, one-sided Jacobi respects this scaling by, in step 1) for example, 
creating backward errors in column i of XD that are proportional to Da [23, 25, 55]. Furthermore, 
each step costs 0(n 3 ) arithmetic operations. 

2.2.2 Bidiagonal SVD 

The second accurate SVD algorithm depends on a Bidiagonal Reduction (BR) of matrix A, a 
factorization A = UBV T where B is bidiagonal (nonzero on the main and first super-diagonal) and 
U and V are unitary. This is an intermediate factorization in the standard SVD algorithm. If the 
entries of B are determined to high relative accuracy, so is B's SVD in the same sense as the RRD 
determines the SVD as described above (but without any factor like max(«;(X), n(Y)) in the error 
bounds). Furthermore, accurate C(n 3 ) algorithms are available [17, 61]. 

2.2.3 Accurate EVD 

Now we discuss the EVD. Clearly, if A is symmetric positive definite, and a symmetric RRD 
A = XDX T is available, then the SVD and EVD are identical. If A is symmetric indefinite but an 
accurate SVD is attainable, then the only remaining task is assigning correct signs to the singular 
values, which may be done using the algorithms of Dopico, Molera, and Moro [24]. Algorithms for 
computing symmetric RRDs of certain symmetric structured matrices are presented in [48, 62]. 

We also know of two accurate nonsymmetric eigenvalue algorithms, for totally nonnegative 
(TN) and for certain sign regular matrices, which we call TN J ( see Section 2.3.6 for definitions). 
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In the TN case, the trick is to implicitly perform an accurate similarity transformation to a 
symmetric tridiagonal positive definite matrix which is available to us in factored form. The TN 
eigenvalue problem is thus reduced to the bidiagonal SVD problem. 

The sign-regular TN^ matrices are similar to symmetric anti-bidiagonal matrices [36] (i.e., the 
only nonzero entries are on the antidiagonal and one sub-antidiagonal). This similarity can be 
performed accurately by transforming implicitly an appropriate bidiagonal decomposition of the 
TN J matrix. Finally, the eigenvalues of the anti-bidiagonal matrix are its singular values with 
appropriate signs known from theory. 

2.3 Designing accurate algorithms for different structured classes 

In this section we look at the particular approaches in designing accurate algorithms for different 
matrix classes in order to fill the rows of Table 1, explaining only a few in detail. Each row refers 
to a matrix class, and each columns to a linear algebra problem. A table entry n a means that an 
accurate linear algebra algorithm costing 0(n a ) arithmetic operations for the given problem and 
class exists. A "No" entry means that no accurate algorithm using traditional arithmetic exists, 
and indeed no accurate algorithm exists without using arbitrary precision arithmetic, in a sense to 
be made precise in Section 3.5. 

We begin by explaining some of the terser column headings: "Any minor" means that an 
arbitrary minor of the matrix may be computed accurately, not just the determinant. "Gauss, elim 
NP" means Gaussian elimination with No Pivoting (GENP), and similarly "PP" and "CP" refer 
to Partial Pivoting (GEPP) and Complete Pivoting (GECP), resp. "RRD" is a Rank Revealing 
Decomposition as described above (frequently but not always the same as GECP). "NE" is Neville 
Elimination [30], a variation on GENP where L and U are represented as products of bidiagonal 
matrices (corresponding to elimination where a multiple of row i is added to row i + 1 to create 
one zero entry). Az = b refers to solving Az = b accurately given conditions on b (alternating signs 
in its components). 

2.3.1 Acyclic matrices 

A matrix A is called acyclic if its graph is; namely, the bipartite graph with one node for each 
row and one node for each column and an edge (i,j) if Aij is nonzero. Acyclic matrices include 
bidiagonal matrices (see Section 2.2.2), and broken arrow matrices (which are nonzero only on the 
diagonal and one row or one column), among exponentially many other possibilities [14]. 

Acyclic matrices are precisely the class of matrix sparsity patterns with the property that the 
Laplace expansion of each minor can have at most one nonzero term [14]. Thus every nonzero 
minor can be computed accurately as the product of n matrix entries. Any acyclic matrix is also a 
DSTU matrix (see the following section), and so the algorithms for DSTU matrices may be used. 

2.3.2 DSTU (diagonal scaled totally unimodular) matrices 

A matrix A is called Totally Unimodular (TU) if all its minors are 0, 1, or —1. A matrix is Diagonally 
Scaled Totally Unimodular (DSTU) if it is of the form A = D1ZD2, where D\ and D2 are diagonal 
and Z is totally unimodular. 

Accurate LDU and SVD algorithms for DSTU matrices were presented in [11] and are based 
on the following observation: 
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Table 1: Existing algorithms for accurate computations with various classes of structured matrices. 
Entries like n 2 are meant in a big-O sense; see Section 2.1 for details. "No" means no accurate 
algorithms exist without using arbitrary precision arithmetic; see Section 3.5 for details. 
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1. The Schur complement of a DSTU matrix is DSTU. 



2. If at any step in the inner loop of Gaussian elimination the subtraction 



a-kk 

has two nonzero operands, then the result a'^ must be exactly 



/ O'ik^'kj 

a ij = a ij — : — i 1 ) 



In other words, to make Gaussian elimination accurate, a one-line addition is required to test if both 
aij and a ^° fcj are nonzero, and to set a'^ = if they are. Then the modified Gaussian elimination 
satisfies NIC, yielding an accurate LDU decomposition. LDU with complete pivoting yields an 
accurate RRD (with k(L) and k(U) both bounded by 0{n 2 ) [15, Theorem 10.2]), and an accurate 
RRD yields an accurate SVD as discussed in Section 2.2.1. 

If a DSTU matrix is symmetric, Pelaez and Moro derived accurate algorithms that preserve 
and exploit the symmetry in their matrices [62]. They also presented such symmetric algorithms 
for TSC matrices discussed next. 

DSTU matrices arise naturally in the formulation of eigenvalue problems for Sturm-Liouville 
equations [18], and more general scalar elliptic PDE with suitable finite element discretizations [15]. 
We discuss this further below in Section 2.4. 



2.3.3 TSC (total signed compound) matrices 

Let S be the set of all matrices with a given sparsity and sign pattern. S is called sign nonsingular 
(SNS) if it contains only square matrices, and the Laplace expansion of the determinant of each 
G € S is the sum of monomials of like-sign, with at least one nonzero monomial. S is called total 
signed compound (TSC) if every square submatrix of any G € S is either SNS, or structurally 
singular (i.e., no nonzero monomials appear in its determinant expansion). Acyclic matrix are 
obviously a special case of TSC matrices, with at most one monomial appearing in each minor. 

According to [15, Lemma 7.2] any minor of a TSC matrix may be computed accurately using 
not more than 4n — 1 arithmetic operations (and not counting various graph traversal operations). 
With this computing the LDU decomposition of a TSC matrix is easy. If at any step of Gaussian 
elimination the subtraction in (1) is one of same-signed quantities, then a'^ is recomputed as a 
quotient of minors, each of which is computed accurately as above. The total cost could go up to 
C(n 4 ), but this is still efficient, according to our convention. 



2.3.4 Diagonally dominant and M-matrices 

A matrix A is called (row) diagonally dominant if the sums s, = an — \ a ij\ are nonnegative 

for all rows i. If in addition its off-diagonal entries a^ are nonpositive (so that Sj = Y2j a ij) then it 
is called a (row) diagonally dominant M -matrix. It turns out that these off-diagonal matrix entries 
and the Sj, not the diagonal entries an, are the right parameters for doing accurate linear algebra 
with this class of matrices. Intuitively, it is clear that the Sj are the natural parameters since the 
conditions Sj > define the class. 

We explain how to do accurate LDU decomposition with no pivoting or complete pivoting, 
in the case of a row diagonally dominant M-matrix. Briefly, the algorithm can be organized to 
satisfy NIC (see [20, 60, 63] for details). For simplicity of notation, let the n 2 matrix parameters 
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be bij = —ciij and Sj, so all are nonnegative. The diagonal elements, an, are readily available 
accurately as a sum of positive numbers: 

n 

an = Si + y^bjj. (2) 

8=1 

The Schur complements computed using Gaussian elimination with complete or no pivot- 
ing inherit the diagonally dominant M-matrix structure. The parameters defining the Schur 
complement — the row sums (call them s£) and off-diagonal elements (call them a'^ = — 6L) — are 
rational functions with positive coefficients in the Sj's and b^s: 

Si = Si-\ si, 6- . = 6j • H 

an an 

with ajj given by (2). Since the above expressions satisfy NIC, the LDU decomposition computed 
using them will be accurate, as will the subsequent SVD. 

Several improvements on this results have been made. Peha suggested in [63] an alternative 
diagonal pivoting strategy which guarantees L and U to be well conditioned (as opposed to "well 
conditioned in practice" which is what Gaussian elimination with complete pivoting delivers). Ye 
generalized this approach to symmetric diagonally dominant matrices (removing the restriction on 
the signs of off-diagonal elements) [77, 78] . It turns out that in the process of Gaussian elimination 
with complete pivoting updating the Sj and the diagonal entries still satisfies NIC. However, there 
can be (arbitrary) cancellation in the off-diagonal entries. Nonetheless, Ye shows that the errors in 
the off-diagonal entries can be bounded in absolute value so as to be able to guarantee that L and 
U are computed with small norm-wise errors, which is all that is required for an RRD to in turn 
provide an accurate SVD. 

2.3.5 Matrices with displacement rank one 

Matrices A that satisfy the Sylvester equation 

DA — AT = B, 

where B = uv T is unit rank, are said to have displacement rank one. In the easiest case, when 
D and T are diagonal (D = diag(di, dy, ■ ■ ■ , d n ), T = diag(ti, t2, ■ ■ ■ , t n )), A is a (quasi-Cauchy) 
matrix a# = [43, 44]. 

The quasi-Cauchy structure is preserved in the process of Gaussian elimination with complete 
pivoting [11, 15]. The explicit formula for a determinant (or a minor) of a (quasi-) Cauchy matrix 
satisfies NIC as mentioned before. In fact, Gaussian elimination can be made accurate still at a 
cost of C(n 3 ) just by changing the inner loop from (1) to 

/ _ {dj - d k ){t k - tj) 

a a- a ir {dk _ t . ){di _ tk) 

This is the starting point in computing the SVD of many displacement rank one matrices. The 
Vandermonde matrix V = [xj 1 ] " has a displacement rank one, where D = diag(»i, X2, • • • , x n ) 
and T is the lower shift matrix ti %—\ = 1, i = 1, 2, . . . , n — 1, t\ n = 1. 
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Then DA— AT = (ajj-l.x^-l, . . . ,<-l) T (0,0, ... ,0,1) = The matrix T is circulant (and a 
root of unity) and is diagonalized T = QAQ* by the (unitary) matrix of the DFT Qij = a^ -1 '^' -1 ^, 
where a is a primitive nth root of unity, with eigenvalues An = 1 )( ri_1 ). 

Thus DA — AQAQ* = B, and so D{AQ) — {AQ)A = BQ, i.e., AQ is a quasi-Cauchy matrix 
(since BQ still has rank one). Now from an accurate SVD of AQ = WEV* we automatically obtain 
an accurate SVD of A = UT,(QV)* . But note that we need both the constant matrices Q and A 
for this to work, which goes beyond NIC. 

The same idea generalizes to other displacement rank one matrices. For example, if DA — AQ = 
B and D and T are unitarily diagonalizable, D = QD\Q* and T = SD2S*, then 

D 1 (Q* l AQ 2 ) - {Q\AQ 2 )D 2 = (Qlu)(v T Q 2 ) 

and Q\AQ 2 is a quasi-Cauchy matrix. If the decompositions D = QD\Q* and T = SD 2 S*, and the 
products Q\u and v T Q 2 can be formed accurately, then from an accurate SVD of the quasi-Cauchy 
matrix Q\AQ 2 = UTV* we obtain an accurate SVD of A: A = {QiU)T,{Q 2 V)* . This approach 
works, e.g., for polynomial Vandermonde matrices involving orthogonal polynomials [22] (see also 
[15, 11, 34, 44]), but again requires knowing certain constants accurately, thus going beyond NIC. 

2.3.6 Totally nonnegative and TN J sign regular matrices 

The matrices all of whose minors are nonnegative are called Totally Nonnegative (TN). Despite 
this seemingly severe restriction on the minors, TN matrices arise frequently in practice — a Van- 
dermonde matrix with positive and increasing nodes, the Pascal matrix, and the Hilbert matrix 
are all examples of TN matrices. The first reference in the literature (that we are aware of) for 
accurate matrix computations dates back to 1963 for a Vandermonde matrix with positive and in- 
creasing nodes in an example of Kahan and Farkas [40, 42, 41]. This phenomenon was rediscovered 
by Bjorck and Pereyra in their celebrated paper [4] and later carefully analyzed and generalized 
[6, 33, 35, 51, 21, 52, 53, 54]. All these methods are based on explicit decompositions of the cor- 
responding matrices where all entries of the decompositions may be computed with expressions 
satisfying NIC. 

These ideas generalize to any TN matrix [46, 47] and are based on a structure theorem for TN 
matrices [27, 30, 31]: Any nonsingular TN matrix can be decomposed as a product of nonnegative 
bidiagonal factors: 

A = L^L® ■ ■ ■ L^DU^ ■ ■ ■ (3) 

As mentioned before, this variation on Gaussian elimination, called Neville elimination, arises by 
eliminating all off-diagonal matrix entries by adding a multiple of row (resp., column) i to row (resp., 
column) i + \ to zero out one entry, and eliminating entries diagonal by diagonal, from the outermost 
(with row (resp., column) multipliers stored in ZA 1 ) (resp., U^)) to innermost (with row (resp., 
column) multipliers stored in L^™ -1 ) (resp., [/( n_1 ))). There are exactly n 2 independent nonnegative 
parameters in the above decomposition. They parameterize the space of all TN matrices. 

It turns out that it is possible to perform essentially all linear algebra on TN matrices by 
using only TN-preserving transformations. In other words, given the parameterization of A in 
(3), it is possible to accurately compute the parameterization of a submatrix, (unsigned) inverse, 
Schur complement, converse, or product of two such matrices, all in 0(n 3 ) time and satisfying 
NIC [47]. In other words, the ability to do accurate linear algebra is "closed" under all these 
operations. Furthermore, based on NIC, it is possible to accurately reduce such a parameterized 



13 



matrix to bidiagonal form, enabling an accurate SVD, and to accurately reduce it to tridiagonal 
form T = BB T by a similarity, reducing the nonsymmetric eigenvalue problem to an accurate SVD 
[46]. Thus, virtually all linear algebra with TN matrices can be performed accurately. 

The only remaining question is about the starting point of this approach - the accurate bidiago- 
nal decompositions of the original matrix. The entries of the bidiagonal decomposition are products 
of quotients of initial minors (i.e., contiguous minors that include the first row or column). Thus 
for virtually all well known TN matrices - Pascal, Vandermonde, Cauchy (as well as their products, 
Schur complements, etc.) there are accurate formulas for their computation [6, 46, 52, 54]. 

A matrix is sign regular [29] if all minors of the same order have the same sign (but not 
necessarily all positive as is the case with TN matrices). A row- or a column-reversed TN matrix is 
sign regular, and the class is such matrices is denoted TN" 7 . Most linear algebra problems for TN J 
matrices follow trivially from the corresponding TN algorithms, except for the eigenvalue algorithm 
[47], which requires a TN^-preserving transformation into a symmetric anti-bidiagonal matrix. 

We believe that the eigenvalue algorithms for TN and TN 17 are the first examples of accurate 
eigenvalue algorithms for nonsymmetric matrices. 

2.4 Going beyond NIC (no inaccurate cancellation) 

We have cited several examples where we can do more general classes of accurate structured matrix 
computations by using more general building blocks than permitted by insisting on no inaccurate 
cancellation (NIC). 

An accurate SVD of a Vandermonde matrix required knowing roots of unity accurately (or more 
precisely, being able to perform the operation x — a accurately, where a is a root of unity). More 
general displacement rank one problems required similar accurate operations for constants a drawn 
from eigenvalues from a fixed sequence of matrices, as well as the knowledge of the orthogonal 
eigenvectors of these matrices. 

Most interestingly, by allowing ourselves to accurately compute a given set of polynomials, but 
all of bounded numbers of terms and degrees, we can extend our DSTU approach from being able 
to accurately find eigenvalues of only rather simply discretized differential equations, to accurately 
compute all the eigenvalues of the scalar elliptic partial differential equation V • (9Vu) + Xpu = 
on a domain £1 with zero Dirichlet boundary conditions, where 9(x) and p{x) are scalar functions 
discretized on a general triangulated mesh in a standard way (isoperimetric finite elements on a 
triangulated mesh). In this case it is the smallest eigenvalues that are of physical interest, and they 
are accurately determined by the coefficients of the PDE. This result depends on a novel matrix 
factorization of the discretized differential operator in [5]. 

It is examples such as these that encourage us to systematically ask what expressions we can 
accurately evaluate, including by allowing ourselves additional "black boxes" as building blocks. 
This is the topic of the next section. 

3 Accurate algorithms for polynomial evaluation 

In this section we give a partial answer to the question "when can a multivariate (real or complex) 
polynomial be evaluated accurately?" These results (except for Section 3.5.3) have been published, 
with completely rigorous proofs, in [13]; we provide here intuitions and proof sketches. 
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To summarize the content of this section, we give (sometimes tight) necessary and sufficient 
conditions for accurate multivariate polynomial evaluation over given domains. These conditions 
depend strongly on the type of arithmetic chosen, specifically on the type of "basic" operations 
allowed, as well as on the domain that the inputs are taken from (and also on whether the inputs 
belong to R n or to C n ). 

Intuitively, accurate evaluation of small quantities is a more complicated issue than accurate 
evaluation of large quantities; thus the "interesting" domains, as we will see, lie arbitrarily close to 
or intersect the variety of the polynomial (the set of points where the polynomial is 0). Evaluation 
on domains that are not of this type (but are otherwise sufficiently well-behaved) is easy (see Section 
3.2). Therefore, the variety plays a necessary role. 

Example 3.1 To illustrate the role of the variety, we use the following example. Consider the 
2-parameter family of polynomials 

Mjk{x) = j ■ x% + x\ ■ xl ■ (j ■ x\ + j ■ xl - k ■ xl) , 

where j and k are positive integers, and the domain of evaluation is R 3 . Assume that we allow only 
addition, subtraction and multiplication of two arguments as basic arithmetic operations, along with 
comparisons and branching. 

When k/j < 3, Mjk{x) is positive definite, i.e., zero only at the origin and positive elsewhere. 
This will mean that Mj^{x) is easy to evaluate accurately using a simple method discussed in Section 
3.2. 

When k/j > 3, then we will show that Mj^x) cannot be evaluated accurately by any algorithm 
using only addition, subtraction and multiplication of two arguments. This will follow from a 
simple necessary condition on the real variety V^Mj^), the set of real x where Mjk{x) = 0, see 
Theorem 3.10. 

When k/j = 3, i.e., on the boundary between the above two cases, Mji c (x) is a multiple of the 
Motzkin polynomial [65]. The real variety V^{Mjk) = {x : \xi\ = \x 2 \ = \x 3 \} of this polynomial 
satisfies the necessary condition of Theorem 3.10, and the simplest accurate algorithm to evaluate 
it that we know of has 8 cases depending on the relative values of \xi ± Xj\. For example, on the 
branch defined by the inequalities x\ — x 3 j < \x\ + x 3 1 A \x2 — x 3 1 < x 2 + £3 1 , the algorithm evaluates 
p using the non-obvious formula 

p(xi,x 2 ,x 3 ) = x\ ■ [4((zi - x 3 ) 2 + (x 2 - x 3 ) 2 + (xi - x 3 )(x 2 - x 3 ))\ 
+ x 3 3 ■ [2(2(xi - x 3 f + 5(x 2 - x 3 )(xi - x 3 ) 2 
+ 5(x 2 - x 3 ) 2 ( Xl - x 3 ) + 2{x 2 - x 3 ) 3 )] 
+ x\ ■ [(xi - x 3 ) 4 + 8(x 2 - x 3 )(xi - x 3 f + (x 2 - x 3 ) 4 

+ 9(X 2 - X 3 f{xi - X 3 f + 8(X 2 - Xg) 3 ^! - S3)] 

+ x 3 ■ [2(x 2 - x 3 )(xi - x 3 )((xi - x 3 ) 3 + (x 2 - x 3 ) 3 
+ 2(x 2 - x 3 )(xi - x 3 ) 2 + 2(x 2 - x 3 ) 2 (xi - s 3 )] 

+ (x 2 - X 3 ) 2 (xi - X 3 ) 2 ((xi - X 3 ) 2 + (s 2 - X 3 ) 2 ) . 

In contrast to the real case, when the domain is C 3 ; Theorem 3.10 will show that M,-^(x) cannot 
be accurately evaluated using only addition, subtraction and multiplication. 

The necessary conditions we obtain for accurate evaluability depend only on the variety of p(x), 
but the variety alone is not always enough. 
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Example 3.2 Consider the irreducible, homogeneous, degree 2d, real polynomial 

p{x) = {x\ d + xf) + (x\ + x 2 2 ){q(x 3 , ...,x n )) 2 , 

where q(-) is homogeneous of degree d — 1. The variety V{p) = {x\ = x<i = 0} satisfies the 
necessary condition for accurate evaluability, but near V(p) the polynomial p(x) is "dominated" by 
(x\ + x|)(<7(x3, x n )) 2 , so accurate evaluability ofp{x) depends on the accurate evaluability ofq(-). 

We may now apply the same principle to q(-), etc., thus creating a decision tree of polynomials. 
Rather than a characterizing theorem, one might expect therefore that, in many cases, the answer 
can only be given by a recursive decision procedure, expanding p{x) near the components of its 
variety and so on. We discuss this more in Section 3.3. 

The rest of Section 3 is structured as follows. In Section 3.1, we formalize the type of algorithms 
we are interested in. Section 3.2 makes rigorous the intuition that accurate evaluation "far from the 
variety" is possible. Section 3.3 considers the traditional model of arithmetic, on "well-behaved" 
domains similar to the ones chosen for the algorithms of Section 2. This model has three basic 
operations: +, — , x, and allows for exact negation. While not sufficient for the accurate evaluation 
everywhere of even simple polynomial expressions like x + y + z, the traditional model is simple 
enough to allow us to give a characterization of accurately evaluable complex polynomials, as well as 
(generally distinct) necessary and sufficient conditions for accurate evaluability of real polynomials 
(sometimes these conditions are identical, and offer a complete characterization). In addition, 
for the real case, we show current progress toward constructing a decision procedure for accurate 
evaluability of real polynomials. Section 3.4 expands the practical scope of our analysis, since 
concluding that a computation is "impossible" is not the end of the story; instead, this begs the 
question of what additional computational building blocks would be needed to make it possible? For 
example, current computers often have a "fused multiply-add" instruction x + y ■ z that computes 
the answer with one rounding error, and there are software libraries that provide collections of 
accurately implemented polynomials needed for certain applications, e.g., computational geometry 
[71]. Given any such a collection of what we will call "black-box" operations (about which we 
assume only a small relative error), we will ask how much larger a set of polynomials can be 
evaluated accurately. 

Finally, Section 3.5 discusses the implications of these results. Firstly, they shed some light 
on the existence of accurate algorithms for linear algebra operations like the ones described in 
Section 2: each such algorithm satisfies NIC (see Section 1, and thus also satisfies the necessary 
condition for accurate evaluability presented in Theorem 3.10). The apparently unrelated classes of 
structured matrices for which efficient and accurate linear algebra algorithms exist share a common 
underlying algebraic structure. Also, there may be other structured matrix classes sharing this 
property and for which accurate algorithms could be built. Secondly, our results show that some 
expressions or classes of problems cannot be accurately evaluated, even with an arbitrary set of 
bounded-degree black-box operations at our disposal. The practical implication of this is that, for 
certain types of problems, the use of arbitrarily high precision is necessary (see Section 4). Lastly, 
but perhaps most importantly, our results lay down a path toward the ultimate goal: a decision 
procedure (or "compiler") which, given as inputs a polynomial p, a domain T>, and (perhaps) a 
set of black-box operations, either produces an accurate algorithm for the evaluation of p on P 
(including how to choose the machine precision e for the desired relative error rj, see Section 3.1), 
or exhibits a "minimal" set of black-box operations that are still needed. 



16 



3.1 Formal statement and models of algorithms 

We formalize here both the problem and the models of algorithms we will use. We introduce 
the notation p CO mp{x,5) for the output of the algorithm, and 6 = (5\, 62, §k) for the vector of 
rounding errors. 

For example, consider the algorithm that computes p(x) = X\ + X2 + £3 by performing two 
additions: first adds x\ to X2, then adds the result to x 3 . If the first and second additions introduce 
the relative errors 5±, respectively 62, we obtain that, for this algorithm, 

Pcom P (x,5) = ((xi +x 2 )(l + 6i) + x 3 ) (l + <y 2 ) 

= (X! + X 2 + X 3 )(l + 5 2 ) + (X! + X 3 )5!(l + 5 2 ) . (4) 

We give below a formal description of the algorithms we consider. For more in-depth discussion 
of these assumptions and comparisons with other models of computations, see Section 4. 

Definition 3.3 All algorithms considered in this section will satisfy the following constraints. 

1. The inputs x are given exactly, rather than approximately. 

2. The algorithm always computes the output p comp (x , 5) infinitely many steps and, moreover, 
computes the exact value of p(x) when all rounding errors 5 = 0. This constraint excludes 
iterative algorithms which might produce an approximate value ofp(x) even when 5 = 0. Some 
of the reasons for this choice can be found in Section 2.2. 

3. The basic arithmetic operations beyond the traditional addition, subtraction and multipli- 
cation, if any, must be given explicitly. We refer to the case when additional polynomial 
operations are included as extended arithmetic. Constants are available to our algorithms 
only in the extended model and are also given explicitly. 

4- We consider algorithms both with and without comparisons and branching, since this choice 
may change the set of polynomials that we can accurately evaluate. In the branching case, 
note that p comp (x , 5) will actually be piecewise polynomial. 

5. If the computed value of an operation depends only on the values of its operands, i.e., if 
the same operands x and y of op(x,y) always yield the same 5 in rnd(op(x,y)) = op(x,y) ■ 
(1 + 5), then we call our model deterministic, else it is nondeterministic. One can show that 
comparisons and branching let a nondeterministic machine simulate a deterministic one, and 
subsequently restrict our investigation to the easier nondeterministic model. 



Finally, we must formalize what type of domains we consider. Though, in principle, any semi- 
algebraic set T> could be examined, for simplicity we consider open domains T>, especially T> = M n 
or T> = C n . We can now give the formal definition of accuracy. 

Definition 3.4 We say that p CO mp(x,5) is an accurate algorithm for the evaluation of p(x) for 
x G £> if 

V < i] < 1 ... for any rj = desired relative error 
3 < e < 1 ... there is an e = machine precision 
V x € T> ...so that for all x in the domain 
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V \Si\ < e ... and for all rounding errors bounded by e 

\Pcomp( x ,5) — p( x )\ < V ' \p( x )\ ■■■ the relative error is at most n. 



Note that the algorithm proposed above, which produces the p com p given in (4) for the evaluation 
of x\ + X2 + £3 is not an accurate algorithm (consider the case when x\ + x<i = —X3). This is not 
accidental (see Theorem 3.10). 

Given an algorithm producing a polynomial p C omp, the problem of deciding whether it is accurate 
is a Tarski-decidable problem [64, 74]. What is unclear if whether the existence of an accurate 
algorithm for a given polynomial and domain is a Tarski-decidable problem, since we see no way 
to express "there exists an algorithm" in the required format. 

3.2 The bounded from below case (empty variety) 

We consider the simpler case where the polynomial p(x) to be evaluated is bounded (in absolute 
value) above and below, in an appropriate manner, on the domain T> (this is what we referred to 
previously as "far from the variety", i.e., the set where the polynomial is 0). If the domain T> is 
compact, we give here, with proof, the following theorem. (We let T> denote the closure of T>.) 

Theorem 3.5 Let p comp (x , 8) be any algorithm computing p{x) satisfying p comp (x,0) =p(x), i.e., 
it computes the right value in the absence of rounding error. Let p m in '■= hif^gg Suppose T> 

is compact and p m i n > 0. Then p comp (x,5) is an accurate algorithm for p(x) onV. 

Proof: Since the relative error on T> is 



it suffices to show that the right hand side numerator approaches uniformly as 5 — > 0. This 
follows by writing the value of p C omp{x, 5) along any branch of the algorithm as 



where a > is a multi-index with at least one component exceeding 0. By compactness of T>, all 
p a are bounded on T>, and thus there exists some constant C > such that 



The right hand side goes to uniformly as the upper bound e on each |<5j| goes to zero. □ 

What about domains that are not compact, e.g., not bounded? The proof above points to some 
of the issues that may occur: ratios p a (x) / p(x) could become unbounded, even though p m %n > 0. 
Another way to see that requiring p m i n > is not enough is to consider the polynomial 



To evaluate this polynomial accurately, intuitively, one needs to evaluate (xi + X2 + X3) 2 accurately, 
once it is sufficiently large. If one uses only addition, subtraction, and multiplication, this is not 
possible. (These considerations will be made explicit in Section 3.3.3.) 



\Pcomp{x,5) - p(x)\/\p(x)\ < \Pcomp(x,5) ~p{x)\/p, 




a>0 



a>0 a>0 



p(x) = 1 + (xi + X 2 + x 3 ) 2 . 
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There are, however, cases in which unboundedness is not an impediment. Consider the case of 
a homogeneous polynomial p(x), to be evaluated on a homogeneous domain T> (i.e., a domain with 
the property that x G T> implies 7X € V, for any scalar 7). Due to the homogeneity of p, we can 
then restrict our analysis to V n S n ~ l (the unit ball in W 1 ), or V n 5 2n_1 (the unit ball in C n ). On 
such domains we can use a compactness argument, as we did before, to obtain: 

Theorem 3.6 Let p(x) be a homogeneous polynomial, let V be a homogeneous domain, and let S 
denote the unit ball in M. n (or C n ). Let 

Pmin.homo = IpO^) I 

xevns 

Then p{x) can be evaluated accurately if Pmin,homo 

> 0. 

A simple, Horner-like scheme that provides an accurate p C omp(x,S) in this case is given in [13], 
along with a proof. 



3.3 Traditional arithmetic 

In this section we consider the basic or traditional arithmetic over the real or complex fields, with 
the three basic operations {+, — , x }, to which we add negation. The model of arithmetic is governed 
by the laws in Section 3.1, and has also been described in Section 2. We remind the reader that 
this arithmetic model does not allow the use of constants. 

Section 3.3.1 describes the necessary condition for accurate evaluability over both real and 
complex domains. Sections 3.3.2, respectively 3.3.3 deal with sufficient conditions for accurate 
evaluability over C n , respectively R n . We show that the necessary and sufficient conditions for 
accurate evaluation coincide in the complex case, in Section 3.3.2. Section 3.3.3 also describes 
progress toward understanding how to construct a decision procedure in the real case. 

Throughout this section, we will make use of the following definition of allowability. 

Definition 3.7 Let p be a polynomial over M n or C n , with variety V(p) : ={x : p(x) = 0}. We 
call V(p) allowable if it can be represented as a union of intersections of hyperplanes of the form 



1. Zi = {x 

2. Sij = {x 

3. Dij = {x 

Lf V(p) is not allowable, we call it unallowable. 



Xi = 0} , (5) 
Xi + Xj = 0} , (6) 
Xi - xj = 0} . (7) 



The word "allowable" in the definition above is used because, as we will see, polynomials with 
"unallowable" varieties do not allow for the existence of accurate evaluation algorithms. 

For a polynomial p, having an allowable variety V(p) is obviously a Tarski-decidable property 
(following [74]), since the number of unions of intersections of hyperplanes (5)-(7) is finite. 
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3.3.1 Necessity: real and complex 

All the statements, proofs, and proof sketches in this section work equally well for both the real 
and the complex case, and thus we will treat them together. 

Throughout this section we will denote the variable space by S € {R n , C n }. 

To state and explain the main result of this section, we need to introduce some additional 
notions and notation. 

Definition 3.8 Given a polynomial p over S with unallowable variety V(p), consider all sets W 
that are finite intersections of allowable hyperplanes defined by (5), (6), (7), and subtract from V(p) 
all those W for which W C V(p). We call the remaining subset of the variety points in general 
position and denote it by G{p). 

If V(p) is not allowable, then from Definition 3.8 it follows that G{p) ^ 0. 

Definition 3.9 Given x € S, define the set Allow(x) as the intersection of all allowable planes 
going through x: 

Aiiow(x) : = (n xeZi Zi) n (n xeSij Sij) n (••.,.. n J),,) , 

with the understanding that 

Allow(x) : =S whenever x ^ Zi, Sij, D{j for all 
Note that Allow(:r) is a linear subspace of S. 

In general, we are interested in the sets Allow(x) primarily when x € G{p). For each such x, 
Allow (x) % V(p), which follows directly from the definition of G(p). 

We can now state the main result of this section, which is a necessity condition for the evalua- 
bility of polynomials over domains. In the following, we denote by Int(Z?) the closure of the interior 
of the domain T>. 

Theorem 3.10 Let p be a polynomial over a domain D £ S, such that V = Int(X'). Let G(p) 
be the set of points in general position on the variety V(p). If Int(P) n G(p) ^ 0, then p is not 
accurately evaluable on T>. 

With a little more work one can see that "failures" are not rare. More precisely, in the same 
circumstances as above, any algorithm attempting to compute p accurately on T> will fail to do so 
consistently on a set of positive measure. 

Corollary 3.11 Let p and T> as before, x E Int(P) n G{p), e > 0, 1 > i] > 0, and p C omp(',o~) be 
the result of an algorithm attempting to compute p on T> with error vector 5. Then there exists a 
set A x arbitrarily close to x and a set As of positive measure in H t := {5 : \5i\ < e} such that 
\Pcomp — p\/\p\ > V when computed at any point y G A x using any vector of relative errors 5 £ A$. 

For the benefit of the reader we give here a sketch of the proof of Theorem 3.10 in an informal 
style. Details and rigorous statements can be found in [13]. 

Proof: Theorem 3.10 The essential idea is to consider under what kind of circumstances can an 
algorithm in which every non-trivial operation introduces errors actually produce a perfect 0. Note 
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that, by definition, for an algorithm to be accurate, it must compute p{x) exactly when x S V(p), 
and it cannot output for any x £ V(p). 

For starters, think of the algorithm as in [l]-as a directed acyclic graph (DAG) with input, 
computational, branching, and output nodes. Every computational node has two inputs (which 
may both come from a single other computational node). All computational nodes are labeled by 
(op(-),5i) with op(-) representing the operation that takes place at that node. It means that at 
each node, the algorithm takes in two inputs, executes the operation, and multiplies the result by 
(1 + 5i). Finally, for every branch of the algorithm, there is a single destination node, with one 
input and no output, whose input value is the result of the algorithm. 

For simplicity, in this sketch we only consider non-branching algorithms. 

Assume that x € G(p) is fixed, and let us examine the algorithm as a function of the error 
variables 5. Some computational nodes in this DAG might do "trivial" work (work that, given the 
input x, outputs for all choices of variables 5). For example, such a node might receive input from 
a single computational node, subtract it from itself, and thus output 0. Note that multiplication 
nodes cannot produce a unless they receive a as an input. 

For all non-trivial computation nodes, the output result is a polynomial of 5 (and thus it will 
only vanish on a set of 8s of measure 0). 

As such, for any x € G(p), there will be a positive measure set A of 5s for which non-trivial 
nodes will not output 0. Let us now choose some 5 in this set and then look at the computational 
output node. Since we assume that the algorithm is accurate, the output node must be 0, therefore 
the output node must be of "trivial" type. Let us track back zeros in the computation, marking the 
nodes where such zeros appear and propagate from. In other words, backward-reconstruct paths 
of zeros that lead to the output of the computation. 

Zeros propagate forward by multiplication, or by the addition/subtraction of identical quanti- 
ties; but how do the first zeros on such paths (from the perspective of the computation) get created? 
A quick analysis shows that there are only three possibilities: either they are sources (zero as an 
input), or come from nodes corresponding to the trivial operation of subtracting an input from 
itself (q(5) — q(S), since the node that computed this input must have been non-trivial), or they 
correspond to the addition or subtraction of two equal source inputs (xi 

We illustrate these possibilities in Figure 2 below. The white nodes are "trivial" nodes, labeled 
with the operation executed there and the error variable; for clarity, we dropped the indices on the 
variables Si, and chosen not to represent certain parts of the graph. The gray nodes are non-trivial 
nodes. Arrows are labeled with the value they carry. Rectangles represent source nodes, and the 
triangle is the final output node. 

The key observation is that all of these zeros would be preserved if we replaced x with any 
y € Allow(x). In other words, if the algorithm outputs p CO mp(x, 5) = 0, for some 5 £ A, then it will 
also output Pcomp(y, 5) = 0, for all 5 £ A, and all y £ Allow (a;). 

For example, assume that the polynomial in Figure 2 is 



with unallowable variety V(p) = {x\ + £4 + xq = 0} n {X2 = 0} f~l {X3 = £5}, and that we want 
to compute p at x = (1,0,2,3,2,-4) 6 G(p). Then the result of the computation would be 
correct: p CO mp(x,5) = 0. However, this algorithm would also output p CO mp(y,8) = for the point 
y = (1, 0, 2, 3, 2, 4), which is in Allow(x) = {x2 = 0} n {£3 = £5}, but not in V(p), since p(y) = 16. 

Since x € G(p), Allow(x) ^ V(p), and thus the algorithm obtains on points not in the variety, 
hence it fails. □ 
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Figure 2: The three ways to produce zeros. 



3.3.2 Sufficiency: the complex case 

Suppose we now restrict input values to be complex numbers and use the same algorithm types and 
the notion of accurate evaluability from the previous sections. By Theorem 3.10, for a polynomial p 
of n complex variables to be accurately evaluable over C n it is necessary that its variety V(p) : ={z G 
C n : p(z) = 0} be allowable. 

We give and explain here a result that shows that this condition is also sufficient. This charac- 
terization is possible in the complex polynomial case because complex varieties are (pun intended) 
much simpler than real ones. In particular, Theorem 3.13 has no correspondent for real varieties, 
and therefore we cannot prove anything close to Theorem 3.12 for the real polynomial case. 

Theorem 3.12 Let p : C n — > C be a polynomial with integer coefficients and zero constant term. 
Then p is accurately evaluable on T> = C n if and only if the variety V{p) is allowable. 

To prove this, we first investigate allowable complex varieties. We start by recalling a basic 
fact about complex polynomial varieties (Theorem 3.13), which can for example be deduced from 
Theorem 3.7.4 in [75, page 53]. Let V denote any complex variety. To say that dim<c(y) = k means 
that, for each z (zV and each 5 > 0, there exists w € VDB(z, 5) such that w has a ^-neighborhood 
that is homeomorphic to a real 2/c-dimensional ball. 

Theorem 3.13 Let p be a non-constant polynomial overC n . Then 

dimc(y(p)) =n — l. 

Corollary 3.14 Let p : C n — ► C be a non-constant polynomial whose variety V(p) is allowable. 
Then V(p) is a union of allowable hyperplanes. 
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Proof: Since V(p) is allowable, let V(p) = UjSj be the (minimal) way to write V(p) as an 
irredundant union of irredundant intersections of hyperplanes. Assume that, for some jo, Sj Q is 
not a hyperplane but an (irredundant) intersection of hyperplanes. Let z € Sj \ L)j^j Sj. Then, 
for some 5 > 0, B(z, 5) fl V{jp) C Sj . Since dhac(Sj ) < n — 1, no point in B(z,5) f) V(p) has a 
y(p)-neighborhood that is homeomorphic to a real 2(n — l)-dimensional ball. Contradiction. □ 

Corollary 3.15 If p : C n — > C is a polynomial whose variety V(p) is allowable, then it is a product 
p = cY\jPj, where each pj is a power of Xi, (xi — Xj), or (xi + Xj). 

Proof: By Corollary 3.14, the variety V(p) is an irredundant union of allowable hyperplanes. 

Choose a hyperplane H in that union. If H = Zj for some Jo, expand p into a Taylor series 
in Xj . If H = Di j (or H = Si Q j Q ) for some io, jo, expand p into a Taylor series in (xj — Xj ) (or 
(xi + Xj )). In this case, the zeroth coefficient of p in the expansion must be the zero polynomial 
in Xj, j ^ jo (or j ^ {^o, jo})- Hence there is a k such that p(x) = Xj p(x) in the first case, or 
p(x) = (xi ±xj ) k p(x) in the second (third) one. In any case, we choose k maximal, so that V(p) 
does not include H . 

It is easy to see that the variety V(p) must include V(p) \ H (the union of all the other 
hyperplanes) - whose dimension is n — 1. Moreover, V(p) (by Theorem 3.13) has dimension n — 1 
and, by the maximality of k, does not include H. 

If V(p) n H : =H' were non-empty, it would follow that dim(U') < n — 2 (since it is included 
in the hyperplane H, and strictly smaller than H). This would contradict Theorem 3.13, which 
states that dim(V(p)) = n — 1. Therefore it must be that V(p) fl H = 0, and thus V(p) must equal 
V(p) \ H, the union of a smaller number of allowable hyperplanes. 

Proceed inductively by factoring p in the same fashion. □ 

The crucial point in the proof above is that the V{p) fl H must be 0, due to Theorem 3.13. 
The same argument would break, in the real case; to illustrate this, consider the polynomial 
p{x\,X2, X3) = x\+x\ (X2+X3) 2 . The variety is V(p) = {x% = 0}, of dimension 2, but after factoring 
out x\ , the variety of the remaining polynomial, p = x\ + (x% + X3) 2 , is \x\ = 0} U {X2 + X3 = 0} - 
which has dimension 1. We can now prove Theorem 3.12. 

Proof of Theorem 3.12. By Corollary 3.15, p = c JJ -pj, with each pj a power of Xk or (x^ ±X(). 
It also follows that c must be an integer since all coefficients of p are integers. Since each of the 
factors is accurately evaluable, and we can get any integer constant c in front of p by repeated 
addition (followed, if need be, by negation), which are again accurate operations, the algorithm 
that forms their product and then adds/negates to obtain c evaluates p accurately. □ 

Theorem 3.12 implies that only homogeneous polynomials are accurately evaluable over C n . 

3.3.3 Sufficiency: toward a decision procedure for the real case 

In this section we relate the accurate evaluability of a polynomial to the accurate evaluability of 
its "dominant terms", and explore a possible avenue toward a decision procedure to establish the 
former via a recursive/inductive procedure based on the latter. 

We consider only homogeneous polynomials, for reasons outlined in Section 3.2, and we also 
consider separately the branching and non-branching cases. Most of the section is devoted to non- 
branching algorithms, but we do need branching for our statements at the end; we keep the reader 
informed of all changes in the assumptions. 
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To accurately compute a homogeneous polynomial of degree d using a non-branching algorithm, 
one needs to use a homogenous algorithm, described by the following definition and lemma, to be 
used later in Section 3.3.5. 

Definition 3.16 We call an algorithm p comp (x, 5) with error set 5 for computing p(x) homogeneous 
of degree d if 

1. the final output is of degree d in x; 

2. no output of a computational node exceeds degree d in x; 

3. the output of every computational node is homogeneous in x. 

Lemma 3.17 If p(x) is a homogeneous polynomial of degree d and if a non-branching algorithm 
evaluates p{x) accurately by computing p CO mp{x,5), the algorithm must itself be homogeneous of 
degree d. 

The proof involves a combination of expressing the relative errors \p C omp(x,5) — p{x)\/\p(x)\ as 
in the proof of Theorem 3.5, and an analysis of the algorithm as a DAG, as in Section 3.3.1. 
Due to the complexity of the issues, the rest of this section is subdivided into four parts: 

• Section 3.3.4 makes rigorous the notion of dominance and explains how to find the dominant 
terms by using various simple linear changes of variables. 

• In Section 3.3.5, we explain how to "prune" an algorithm to manufacture an algorithm that 
evaluates one of its dominant terms, and we establish that accurate evaluation of the dominant 
terms identified in Section 3.3.4 is necessary for the accurate evaluation of the polynomial. 

• Section 3.3.6 establishes that accurate evaluation of a special set of dominant terms, together 
with the slices of space where they dominate, is sufficient for accurate evaluation of the 
polynomial. 

• Finally, Section 3.3.7 discusses obstacles to a complete inductive procedure. 
3.3.4 Dominance 

We now describe what we mean by "dominant terms" of the polynomial. Given an allowable 
variety V(P), we fix an irreducible component of V{p). Any such component is described by linear 
allowable constraints. We note (see [13]) that any given component of V(p) can be put into the 
form x\ = X2 = ... = Xfc = using what we call a standard change of variables; standard changes of 
variables are linear transformations of the variables, which are intuitively simple, but whose exact 
combinatorial definition is long and we choose to leave it out. 

After a standard change of variables, we look at the component x\ = X2 = ... = Xk = 0. We 
can assume that the polynomial p(x) can be written (almost following MATLAB notation) as 



where we write xn.^i := (x±, Xk), x\k+i :n \ := ( x k+ii x n)- Also, we let A be the set of all 
multi-indices A := (Ai, Xk) appearing above. 




agA 
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To determine all dominant terms associated with the component x\ = X2 = ■ ■■ = x^ = 0, 
consider the Newton polytope P of the polynomial p with respect to the variables x\ through Xk 
only, i.e., the convex hull of the exponent vectors A € A (see, e.g., [57, p. 71]). Next, consider the 
normal fan N(P) of P (see [79, pp. 192-193]) consisting of the cones of all row vectors rj whose dot 
products with x G P are maximal for x on a fixed face of P. That means that for every nonempty 
face F of P we take 

k 

N F : ={q = (m, ... ,n k ) G (M fe ) : F C {x G P : r]x{: = rijXj) = maxr/y}} 

r— f y&P 

and 

iV(P) : ={iV F : F is a face of P}. 

Finally, consider the intersection of the negative of the normal fan —N{P) and the nonnegative 
quadrant . This splits the first quadrant into several regions 5a according to which subsets 
Aj of exponents A "dominate" close to the considered component of the variety V(p), in the following 
sense: 

Definition 3.18 Let Aj be a subset of A that determines a face of the Newton polytope P of p such 
that the negative of its normal cone —N(P) intersects (M fc )+ non-trivially (not only at the origin). 
Define Saj G (R fc )+ t° be the set of all nonnegative row vectors n such that 

rjXi = r]\2 < rjX, VAi, A2 G Aj, and A G A \ Aj . 

Note that if x\ through x k are small, then the exponential change of variables Xj 1— > — log \xj\ 
gives rise to a correspondence between the nonnegative part of —N(P) and the space of original 
variables in.).]. We map back the sets 5a- into a neighborhood of in M fc by lifting: 

Definition 3.19 Let Fa 3 C [—1, l] k be the set of all points x^.u G M. k such that 

n:=(-log\xi\,...,-log\x k \) G S Aj . 

For any j, the closure of F^ contains the origin in R fc . Given a point x\\,u G F\., and given 
r) = (ni, Hi-, ■ ■ ■ , n&) G Sa^-, for any t G (0,1), the vector {x\t ni , . . . , Xkt nk ) is in Fa-. Indeed, if 
(— log |xi|, . . . , — log G 5a-, then so is (— log |xi|, . . . , — log \xk\) — log \t\n, since all equalities 
and inequalities that define S Aj will be preserved, the latter because log \t\ < 0. 

Example 3.20 Consider the following polynomial: 

p(x\, X2, X3) = + x\x\x^ + X 1 X Z? + x l x 2^ + 2 -i° a: '2' Z '3- 

This polynomial is positive and easy to evaluate accurately; the reason we have chosen it is to 
illustrate the Newton polytope, its normal fan, and the sets F Aj and S\. defined above. 
For this example, 

V(p) = {xi = x 2 = 0} U {xi = x 3 = 0} U {x 2 = x 3 = 0} . 

We examine the behavior of the polynomial near the x\ = X2 = component of the variety (i.e., 
we consider x% to be large). Note that only the first three monomial terms, x\x^ , xlx^x^ 4 , andxfx^ 2 
will play an important role, since if x\,X2 <C I, xfx^ 4 <C X 2 X \ 2 , respectively, x^x^x^ <C xfx\ 2 . 

We show below the Newton polytope P of p with respect to the variables x\, X2, its normal fan 
N(P), the intersection —N(P) n R\, the regions S\., and the regions F\.. 
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Figure 3: The Newton polytope P and its Figure 4: The intersection —N(P) n and 
normal fan N(P) for Example 3.20. the regions S\.. 




-1 -0.5 0.5 1 1.5 2 

Figure 5: The regions F\ } . 



Definition 3.21 We define the dominant term of p(x) corresponding to the component x\ = ■ ■ ■ = 
Xk = and the region F\- by 

Pdo mj {x): = ^ c A^[i :fc ]<?A(z[fc+l:n]) • 

The following observations about dominant terms are immediate. 
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Lemma 3.22 Let rj = (ni, . . . ,rik) G 5^ and let dj : = X^A eA ^i n i- Let x° be fixed and let 
x(t):=(xi(i),...,x„(t)), Xj (t): = | j = 1,' . .'. , n. 



J 

Then pdom j( x (t)) has degree dj in t and is the lowest degree term of p(x(t)) in t, that is 

p(x(t)) = pdomj (x(t)) + o(t dj ) as t -> 0, deg t Pdom. (x(t)) = dj. 

Corollary 3.23 Under the assumptions of Lemma 3.22 suppose that Pdomj(%°) 7^ 0. Then 

.. Pdom 3 (x(t)) 

hm . . ,, — = 1. 

t^o p{x(t)) 

The next question is whether the term Pdom^ dominates indeed the remaining terms of p in the 
region F^. in the sense that Pdomj (x) / p(x) is close to 1 sufficiently close to x\ = ■ ■ ■ = x\. = 0. 
Indeed, we show that each dominant term Pdom^ such that the convex hull of Aj is a facet of the 
Newton polytope of p and whose variety V(pdomj) does not have a component strictly larger than 
the set xi = ■ • • = Xfc = dominates the remaining terms in p, not only in F Aj , but in a certain 
slice F\. around F Aj . These dominant terms, corresponding to larger sets Aj , are the useful ones, 
since they pick up terms relevant not only in the region F Aj but also in its neighborhood. 

In Example 3.20 above, the useful dominant terms correspond to the regions -^{(2, 2), (8,0)} an d 
-P{(2,2),(o,8)} (the only relevant edges of the polygon). This points to the fact that we should 
be ultimately interested only in dominant terms corresponding to the facets, i.e., the highest- 
dimensional faces, of the Newton polytope of p. Note that the convex hull of Aj is a facet of the 
Newton polytope N if and only if the set S/\_. is a one-dimensional ray. 

The next lemma will be instrumental for our results in Section 3.3.6. It shows that each 
dominant term pdomj such that the convex hull of Aj is a facet of the Newton polytope of p and 
whose variety V(pdomj) does not have a component strictly larger than the set x% = ■ ■ ■ = = 
indeed dominates the remaining terms in p in a certain "slice" F\. around F^. . 

Lemma 3.24 Let Pdomj be the dominant term of a homogeneous polynomial p corresponding to the 
component x\ = ■ ■ ■ = xt = of the variety V(p) and to the set Aj whose convex hull is a facet of 
the Newton polytope N. 

Let Saj be any closed pointed cone in (M fc )+ with vertex at that does not intersect other one- 
dimensional rays S\ r I 7^ j, and contains \ {0} in its interior. Let F\. be the closure of the 
set 

{x[i-.k] e [-1,1]* : (-log|xi|,...,-log|x fc |) G S Aj }. (8) 

Suppose the variety V(pd om ) of Pdom is allowable and intersects F\. only at 0. Let \\ ■ \\ be any 
norm. Then, for any 5 = 5(j) > 0, there exists e = e(j) > such that 



Pdomj ( x [l:k] ; x [k+l:n]) 
P{ x [l:k]> x [k+l:n]) 



< 5 whenever - — — - < e and X[ 1: fc] G F\ j . (9) 

|p[fe+l:n] II 
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For a proof of Lemma 3.24, the reader is referred to [13]. 

The above discussion of dominance was based on the transformation of a given irreducible 
component of the variety to the form x\ = ■ ■ ■ = Xk = 0. We must reiterate that the identification 
of dominant terms becomes possible only after a suitable change of variables C is used to put a 
given irreducible component into the standard form x± = ■ ■ ■ = x k = and then the sets A,- are 
determined. Note however that the polynomial Pdomj is given in terms of the original variables, 
i.e., as a sum of monomials in the original variables x q and sums/differences x q ±x r . We therefore 
use the more precise notation pdomj ,c m the rest of this section. 

Definition 3.25 Without loss of generality we can assume that any standard change of variables 
has the form 

x = { x [l:ki]> x [ki+l:k2]-> ■ ■ ■ ^[iki-i+l:*!]) ^ X = (x^. kl ] , £[jfc 1+ l : fe 2 ] , . . . , Xy kl _ l+1 . k ^ ), 

where x km+1 :=x km+1 , x km+2 .=x km+2 - o- km+2 x km+1 , 
Xk m+1 ■ = x k m+1 ~ o- km+1 x km+1 , k : =0, a r = ±1 for all pertinent r . 

Note also that we can think of the vectors n 6 S\j as being indexed by integers 1 through k[, 
i.e., n = (n\, . . . ,n kl ). Moreover, to define pruning in the next subsection we will assume that 

n k m +i < n r for all r = k m + 2, . . . , k m +i and for all m = 0, — 1. (10) 
3.3.5 Pruning 

We show here how to convert an accurate algorithm that evaluates a polynomial p into an accurate 
algorithm that evaluates a selected dominant term Pdom ,c- This will imply that being able to 
evaluate dominant terms accurately is a necessary condition for being able to evaluate the original 
polynomial accurately. 

This process, which we will refer to as pruning, will consist of deleting some vertices and edges 
and redirecting certain other edges in the DAG that represents the algorithm. We explain the 
pruning process informally and through an example; for the rigorous definition, see [13]. 

Starting at the sources, we process each node provided that both of its inputs have been pro- 
cessed already (acyclicity insures that this can be done). Then, at any node u which performs 
an addition or subtraction of two inputs from nodes v and w of different degrees, we delete the 
node and the in-edge from the input of smaller degree (say v ) and redirect the out-edge from u 
to w (the node with the larger degree output). Then we go backward and delete all nodes and/or 
edges on that sub-DAG, up to the source nodes. We denote the output of the pruned algorithm by 

Pdom,j,C,cornp(. x i 

We illustrate this process below. 
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Figure 6: Pruning an algorithm for p(x) = x\x\ + (x2 — X3) 4 + (#3 — x^x 2 . 

Example 3.26 Figure 6 shows an example of pruning an algorithm that evaluates the polynomial 

x\x\ + (x 2 - x 3 ) 4 + (x 3 - x 4 ) 2 xl 

using the substitution 

(tXl,X2,tX3 + X 2 , tX4 + X 2 , X5) 

near the component 

X\ = 0, X2 = X3 = X4. 

The result of pruning is an algorithm that evaluates the dominant term 

2 2,/ \2 2 

x l x 2 + K x 3 ~ x ±) X 5 . 

The node A has two sub-DAGs leading to it; the right one (going back to the sources X2 and X3) 
is pruned due to the fact that it computes (X2 — X3) 4 , a quantity of order 0(t 4 ), whereas the other 
produces x\x\, a quantity of order 0(t 2 ). 

The output of the original algorithm is given by 

Pco mp (x,5) = [(cc?(l + 5i)x|(l + £ 2 )(l + <5 3 ) 

+(x 2 - x 3 ) 4 (l + <5 4 ) 4 (1 + <5 5 ) 2 (1 + S 6 )] (1 + S 7 ) 

+ [(X3 - x 4 ) 2 (l + 5s) 2 (l + S 9 )xl(l + <y 10 )(l + 6u)] (1 + 6 12 ). 
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The output of the pruned algorithm is 

Pdom^Ccompix^) = \x\x\ (1 + 6 1 ) (1 + 6 2 ) (1 + S 3 ) ) (1 + S 7 ) + (x 3 - X^fx\ 

x (1 + 5 8 ) 2 (1 + 5 9 )(1 + <y 10 )(l + S n )] (1 + 6 12 ). 

We formalize the main result regarding the pruning process below. 

Theorem 3.27 Suppose a non-branching algorithm evaluates a polynomial p accurately on R n by 
computing p comp (x , 5) . Suppose C is a standard change of variables (as in Definition 10) associated 
with an irreducible component ofV{p). Let Pdom C be one of the corresponding dominant terms 
of p and let 5V satisfy (10). Then the pruned algorithm with output Pdom,j,c,comp(x,d~) evaluates 
Pdomj,c accurately on M. n . In other words, being able to compute all such Pdom^c for all components 
of the variety V(p) and all standard changes of variables C accurately is a necessary condition for 
computing p accurately. 

3.3.6 Sufficiency of evaluating dominant terms 

Our next goal is to prove a converse to Theorem 3.27; however, strictly speaking, the results that 
follow do not provide a true converse, since branching is needed to construct an algorithm that 
evaluates a polynomial p accurately from algorithms that evaluate its dominant terms accurately. 
Recall that Theorem 3.27 involves non-branching algorithms. 

We make two assumptions: that our polynomial p is homogeneous and irreducible. The lat- 
ter assumption effectively reduces the problem to that of accurate evaluation of a nonnegative 
polynomial, due to the following lemma. 

Lemma 3.28 If a polynomial p is irreducible and has an allowable variety V(p), then it is either 
a constant multiple of a linear form that defines an allowable hyperplane, or it does not change its 
sign in W 1 . 

Hence, we can restrict ourselves to the case of a homogeneous, irreducible, non-negative poly- 
nomial over the entire M. n . For this case, we have the following theorem. 

Theorem 3.29 Let p be a homogeneous nonnegative polynomial whose variety V(p) is allowable. 
Suppose that all dominant terms Pdomj ,c f or a ^ components of the variety V(p), all standard changes 
of variables C and all subsets Aj satisfying (10) are accurately evaluable. Then there exists a 
branching algorithm that evaluates p accurately overW 1 . 

Proof: Theorem 3.29 We first show how to evaluate p accurately in a neighborhood of each 
irreducible component of its variety V(p). We next evaluate p accurately off these neighborhoods 
of V(p). The final algorithm will involve branching depending on which region the input belongs 
to, and the subsequent execution of the corresponding subroutine. 

Consider a particular irreducible component Vo of the variety V(p); using a standard change 
of variables C, we map Vo to a set of the form x\ = ■ ■ ■ = = 0. We create an e-neighborhood 
of Vo where we can evaluate p accurately; this neighborhood is built up from semi-algebraic e- 
neighborhoods. More precisely, for each Vo, we can find a collection (Sj) of semi-algebraic sets, all 
determined by polynomial inequalities with integer coefficients, and the corresponding numbers €j, 
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so that the polynomial p can be evaluated with desired accuracy r\ in each €j -neighborhood of Vq 
within the piece Sj. Moreover, testing whether a particular point x is within ej of Vq within Sj can 
be done by branching based on polynomial inequalities with integer coefficients. 

The final algorithm will be organized as follows. Given an input x, determine by branching 
whether x is in Sj and within the corresponding e,- of a component Vq. If that is the case, evaluate 
p(x) using the algorithm that is accurate in Sj in that neighborhood of Vo- For x not in any of 
the neighborhoods, evaluate p by Horner's rule. Since the polynomial p is strictly positive off the 
neighborhoods of the components of its variety, the reasoning of Section 3.2 applies, showing that 
the Horner's rule algorithm is accurate. If x is on the boundary of a set Sj, any applicable algorithm 
will do, since the inequalities we use are not strict. Thus the resulting algorithm for evaluating p 
will have the desired accuracy rj. □ 

3.3.7 Obstacles to a complete inductive procedure 

The results of the previous sections suggest the existence of an inductive procedure that could 
be used to determine whether or not a given polynomial is accurately evaluable by reducing the 
problem for the original polynomial p to the same problem for its dominant terms, then their 
dominant terms, and so forth, going all the way to "base" cases: monomials or other polynomials 
that are easy to analyze. In order to work, the dominant terms would have to be simpler, or smaller, 
by some measure, than the original polynomial; this would require finding an induction variable 
that gets reduced at each step. 

The most obvious two choices are the number of variables or the degree of the polynomial 
under consideration; unfortunately, there are cases when both fail to decrease. Furthermore, the 
dominant term may even coincide with the polynomial itself. For example, if 

p(x) = A{x^:n])x\ + B(x [3 . n ])xiX2 + C(x [3 . n] )xl 

where A, B, C are nonnegative polynomials in x% through x n , then the only useful dominant term 
of p in the neighborhood of the set x\ = x<i = is the polynomial p itself. For this case, analyzing 
the dominant term yields no progress whatsoever. 

Another possibility is induction on domains or slices of space, but we do not yet envision how 
to make this idea precise, since we do not know exactly when a given polynomial is accurately 
evaluable on a given domain. 

Further work to establish a full decision procedure is therefore highly desirable. 

3.4 Extended arithmetic 

In this section, we consider adding "black-box" real or complex polynomial operations to the basic, 
traditional model. We describe this type of operations below. 

Definition 3.30 We call a black-box operation any type of operation that takes a number of inputs 
(real or complex) x±,...,Xk and produces an output q such that q is a polynomial in x±, . . . , x/.. 

Example 3.31 q(xi, X2, £3) = x\ + £2^3 • 

Note that +, — , and • are all black-box operations on two inputs. 

Consider a fixed set of multivariate polynomials {qj : j € J} with real or complex inputs 
(perhaps infinite). In the extended arithmetic model, the operations allowed are the black-box 
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operations q±, . . . ,qk, and negation. With the exception of negation, which is exact, all the others 
yield rnd(op(ai, . . . , a{)) = op(a\, . . . , oj)(l + S), with \5\ < e (e here is the machine precision). We 
consider the same arithmetical models as in Section 3.1, with this extended class of operations. 

3.4.1 Necessity: real and complex 

In order to analyze the way in which the necessity condition for having an allowable variety (The- 
orem 3.10) changes under these extended assumptions, we need to introduce a new, more general 
definition of allowability. 

Essentially, a black box for computing p can be used for computing other polynomials, namely 
all the polynomials obtainable from p via permuting, repeating, negating, and zeroing some subset 
of the variables. Therefore each black box accounts for a potentially larger set of polynomials that 
can be evaluated with a single rounding error, using that black box, and we must consider all of 
them in our analysis. Note that in the traditional case (when we had addition, subtraction, and 
multiplication of two numbers as our black boxes) our set of three operations was closed under the 
aforementioned changes. 

The definition below formalizes the set of polynomials obtainable from a given one, through 
this process of negation, repetition, permutation, and zeroing of variables. 

Recall that we denote by S the space of variables (which may be either W 1 or C n ). From now 
on we will denote the set {1, . . . , n} by /C, and the set of pairs 6 fC x /C such that i < j by 
K\. 

Definition 3.32 Let p(x\, . . . , x n ) be a multivariate polynomial over S with variety V(p). Let 
Kz C K, and let JCd,K.s C /C< . Modify p as follows: impose conditions of the type Z\ for each 
i € JCz, and of type Dij, respectively Sij, on all pairs of variables in Kd, respectively K,s- Rewrite 
p subject to those conditions (e.g., set Xj = for all i € K-z), end denote it by p, and denote by Kr 
the set of remaining independent variables (use the convention which eliminates the second variable 
in each pair in JCd or fCs )■ 

Choose a set T C JCr, and let 

Vt,k z ,k d ,k. s (p) = ri a V(q a ) , 

where the polynomials q a are the coefficients of the expansion of p in the variables xt: 

p{x u ...,x h ) = ^2q a x? , 

a 

with q a being polynomials in x^ R ^ T only. 

Finally, let ICn be a subset of)Cji\T. We negate each variable inJCjy, and letVT t ic Z: ic D ,K,s,K-N (P) 
be the variety obtained from Vt,k. z ,k. d ,ic s (p), with each variable in ICn negated. 

For simplicity, we denote a set (T,)Cz,ICd,ICs,JCn) by 1. 
We illustrate this process by the following example. 

Example 3.33 Let p(x,y,z) = x + y ■ z (the fused multiply- add). We record below some of the 
possibilities for the subvarieties Vj{p); the sets I = (T, fCz, ICd, K-Si ^Qv) are implicit. 

oV(p{x,0,z)) ={x = 0}, 
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O V(p(x, x, x)) = {x = 0} U {x = -1}, 
O V(p(0,y,z)) ={y = 0}U{z = 0}, 
O V(p(x, y, -x)) = {x = 0} U {y = 1}, 
O V(p(x,y,y)) ={x + y 2 = 0}, 
O V(p(x, y, —z)) = {x — yz = 0}, etc. 

We include the "traditional" operations in the arithmetic by defining g_2 (371,2:2) = x±X2, 
q-i(x\,X2) = x\ + X2, and qo(xi,X2) = x\ — X2, and note that the sets 

1. Zi = {x : Xl = 0} , (11) 

2. Sij = {x : Xi + Xj = 0} , (12) 

3. Dij = {x : Xi — Xj = 0} (13) 

describe all non-trivial sets of type Vj, for g_2, q~i, and go- 

We will assume from now on that the black-box operations qj with j G J (J may be infinite, 
and {—2, —1,0} C J) are given and fixed. 

Definition 3.34 We call any set Vj(qj) with 1 = (T,JCz,}Cd,ICs,}Cn) cls defined above and qj a 
black-box operation basic q-allowable. 

We call any set R irreducible q-allowable if it is an irreducible component of a (finite ) intersec- 
tion of basic q-allowable sets, i.e., when R is irreducible and 

where each Qi is a basic q-allowable set. 

We call any set Q q-allowable if it is a (finite) union of irreducible q-allowable sets, i.e., 

Q = UjRj , 

where each Rj is an irreducible q-allowable set. 

Any set R which is not q-allowable we call q-unallowable. 

Note that the above definition of g-allowability is closed under taking union, intersection, and 
irreducible components. This parallels the definition of allowability for the classical arithmetic 
case - in the classical case, every allowable set was already irreducible (being an intersection of 
hyperplanes). 

Definition 3.35 Given a polynomial p with q-unallowable variety V(p), consider all sets W that 
are q-allowable (as in Definition 3.34), and subtract from V(p) those W for which W C V(p). We 
call the remaining subset of the variety points in general position and denote it by Q{p). 

Since V(p) is ^-unallowable, Q{p) is non-empty. 
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Definition 3.36 Given x G S, define the set q— Allow(x) as the intersection of all basic q- allow able 
sets going through x: 

q-Allow(sc) : = f\ jeJ (n x : xev x ( qj ) v i{lj)) , 
for all possible choices of I. The intersection in parentheses is S whenever x ^ Vj(qj) for all I. 

Note that when x € G(p), q— Allow(x) % G(p). 
We can now state our necessity condition. 

Theorem 3.37 Given the black-box operations {qj : j € J}, and the model of arithmetic described 
above, let p be a polynomial defined over a domain T> C S. Let Q{p) be the set of points in general 
position on the variety V(p). If there exists x 6 T>r\Q{p) such that q— Allow(x) nlnt(P) 7^ 0, then 
p is not accurately evaluable on T>. 

Proof: Theorem 3.37 The proof mimics the proof of Theorem 3.10; once again, we trace back 
zeros to what we now call q- allowable conditions, and make use of the DAG structure of the 
algorithm. In the non-branching case, we obtain that if the algorithm is run on an input x € G(p), 
then either p CO mp(x, 6) 7^ for almost all 5, or p CO mp(y, 5) = for all y € Allow (a;) \ V(p) and for all 
5. The proof for the branching case is again a refinement of the proof for the non-branching one. 
□ 

Note that, if we consider only algorithms without branching, Theorem 3.37 remains true in the 
tighter case when we drop the irreducibility constraint from the definition of allowability. 

We can also show that, arbitrarily close to any point x £ G(p), we can find sets S of positive 
measure such that the relative accuracy of the algorithm when run with inputs in S is either 1 or 
00; a result identical to Corollary 3.11 can also be proved for the extended arithmetic case. 

3.4.2 Sufficiency: the complex case 

In this section we obtain a sufficiency condition for the accurate evaluability of a complex polyno- 
mial, given a black-box arithmetic with operations {qj \ j E J} (J may be an infinite set). 

Throughout this section, we assume our black-box operations include q c , which consists of 
multiplication by a complex constant: q c (x) = c ■ x. Note that this operation is natural, and can 
be performed accurately given only a suitably accurate approximation of c. 

We believe that the sufficiency condition we obtain here is not a necessary one, in general-but it 
does subsume the sufficiency condition we found for the basic complex case with classical arithmetic 
{+,-•}■ 

Theorem 3.38 (General case) 2 Given a polynomial p : C n — > C with V(p) a finite union of 
irreducible varieties Vj(qj), for j € J, and 2 as above, then p is accurately evaluable. 

Theorem 3.39 (Affine case) If all black-box operations qj, j G J are affine, then a polynomial 
p : C n — > C is accurately evaluable iffV(p) is a union of varieties Vj(qj), for j G J and I as in 
Definition 3.32. 

The proofs follow easily from Lemma 3.40. 
2 This condition was stated in a slightly weaker form in [13]. 
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Lemma 3.40 If all varieties Vj(qj)) in the union defined by V(p) are irreducible (in particular, if 
they are affine), then p is a product p = cY\jPj, where each pj is a power of qj or a polynomial 
obtained from qj by repeating, negating, or zeroing some of the variables; c is a complex constant. 
The argument is identical to the one we gave for the proof of Corollary 3.15, and it hinges on the 
irreducibility of the varieties Vx(qj)) in the union. 

Note that Theorem 3.39 is a more general necessary and sufficient condition than Theorem 3.12, 
which only considered having g_2 5 Q-i, an d qo as operations, and restricted the polynomials to have 
integer coefficients (thus eliminating the need for q c ). 

3.5 Numerical linear algebra consequences 

Here we examine the results of Section 2, in light of Section 3. We take another look at Table 1, 
explaining the strong "No" entries there. Those entries mean that no accurate algorithms exist 
even given an arbitrary set of black-box operations of bounded degree or with a bounded number of 
arguments. In other words, arbitrary precision arithmetic is needed for their accurate solution. This 
is the case for Toeplitz matrices because, as discussed earlier, we cannot evaluate their determinants 
accurately, and determinants are necessary to get the indicated entries accurately. Fully off-diagonal 
submatrices of diagonally dominant matrices are completely unstructured matrices, and so with 
irreducible determinants of unbounded degree. The same is true of M-matrices, except that the 
submatrix entries are nonpositive. Minors of submatrices of non-TN Vandermonde have factors 
that are general Schur functions of arbitrary arguments, which can be irreducible of unbounded 
degree. We suspect that many other entries should also be "No" . 

3.5.1 Validation of our results 

If we examine the matrix classes in Table 1 , we see that their determinants are rational functions 
whose sets of zeros and of poles are allowable in traditional arithmetic. By considering numerators 
and denominators of these rational functions separately we see that both can be computed accu- 
rately (and then, provided that the denominator is not 0, their ratio can be computed accurately). 
Incorporating division more formally into our model to identify necessary and sufficient conditions 
for accurate evaluability of rational functions is the subject of ongoing work. 

3.5.2 Negative results: accurate evaluation is impossible 

Here we examine two classes of matrices for which some or all linear algebra operations are im- 
possible given any set of black boxes with a bounded number of arguments: Toeplitz and various 
classes of Vandermonde that we define later. 

We prove our results by reducing the problem of doing accurate linear algebra to that of accu- 
rately evaluating the determinant and certain minors (recall that the latter is a necessary condition 
for the former). What these results say roughly that, if one wants to construct an accurate algo- 
rithm for finding the inverse that works for Toeplitz or Vandermonde matrices as a class, one needs 
to use arbitrary precision (more on this in Section 4). 

We start by examining a more general problem. If the determinants p n {x) = det M nxn (x) 
of a class of n-by-n structured matrices M do not satisfy the necessity conditions described in 
Theorem 3.37 for any enumerable set of black-box operations (perhaps with other properties, like 
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bounded degree), then we can conclude that accurate algorithms of the sort described in the above 
citations are impossible. 

In particular, to satisfy these necessity conditions would require that the varieties V(p n ) be 
allowable (or (/-allowable) . For example, if V is a Vandermonde matrix, then det(^) = Y\i<j( x i~ x j) 
satisfies this condition, using only subtraction and multiplication. 

The following theorem states a negative condition (which guarantees impossibility of existence 
for algorithm using any enumerable set of black-box operations of bounded degree). 

Theorem 3.41 Let M(x) be an n-by-n structured complex matrix with determinant p n {x) as de- 
scribed above. Suppose that for any n, p n (x) has an irreducible factor p n (x) whose degree goes to 
infinity as n goes to infinity. Then for any enumerable set of black-box arithmetic operations of 
bounded degree, for sufficiently large n it is impossible to accurately evaluate p n {x) over the complex 
numbers. 

Proof: Let q±, ■■■,q m be any finite set of black-box operations. To obtain a contradiction, sup- 
pose the complex variety V(p n ) satisfies the necessary conditions of Theorem 3.37, i.e., that V(p n ) 
is allowable. This means that V(p n ), which includes the hypersurface V(p n ) as an irreducible com- 
ponent, can be written as the union of irreducible (/-allowable sets (by Definition 3.34). This means 
that V{p n ) must itself be equal to an irreducible (/-allowable set (a hypersurface), since represen- 
tations as unions of irreducible sets are unique. The irreducible (/-allowable sets of codimension 1 
are defined by single irreducible polynomials, which are in turn derived by the process of setting 
variables equal to one another, to one another's negation, or zero (as described in Definitions 3.32 
and 3.34), and so have bounded degree. This contradicts the unboundedness of the degree of V(p n ). 
□ 

In the next theorems we apply this result to the set of Toeplitz matrices. We use the following 
notation. Let T be an n-by-n Toeplitz matrix, with Xj on the j-th diagonal, so xq is on the main 
diagonal, x n _i is in the top right corner, and x\- n is in the bottom left corner. We give the 
following result without proof; for a proof, see [13]. 

Theorem 3.42 The determinant of a Toeplitz matrix T is irreducible over any field. 

Therefore, for complex Toeplitz matrices, we have the following corollary. 

Corollary 3.43 The determinants of the set of complex Toeplitz matrices cannot be evaluated 
accurately using any enumerable set of bounded- degree black-box operations. 

In the real case, irreducibility of p n is not enough to conclude that p n cannot be evaluated 
accurately, because Vu.(p n ) may still be allowable (and even vanish). So we consider another 
necessary condition for allowability: Since all black boxes have a finite number of arguments, their 
associated codimension-1 irreducible components must have the property that whether x € Vj(qj) 
depends on only a finite number of components of x. Thus to prove that the hypersurface Vjg(p n ) 
is not allowable, it suffices to find at least one regular point x* in Vu.(p n ) such that the tangent 
hyperplane at x* is not parallel to sufficiently many coordinate directions, i.e., membership in 
VM.(pn) depends on more variables than any Vj(qj). This is easy to do for real Toeplitz matrices. 

Theorem 3.44 Let V be the variety of the determinant of real singular Toeplitz matrices. Then 
V has codimension 1, and at almost all regular points, its tangent hyperplane is parallel to no 
coordinate directions. 
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Corollary 3.45 The determinants of the set of real Toeplitz matrices cannot be evaluated accu- 
rately using any enumerable set of bounded- degree black-box operations. 

Proofs of these results can be found in [13]. Corollaries 3.43 and 3.45 imply that accurate linear 
algebra (in the sense of Section 2) is impossible on the class of Toeplitz matrices (either real or 
complex) in bounded precision. 

We consider now the class of polynomial Vandermonde matrices V, where = Pj-\{xi) is a 
polynomial function of Xi, with 1 < i, j < n. This class includes the standard Vandermonde (where 
Pj-i(xi) = x\ 1 ) and many others. 

Consider first a generalized Vandermonde matrix where Pj-\{xi) = x\ 1+An ~ l with < Ai < 
A2 < ••• < \ n . The tuple A = (Ai, A2, A n ) is called a partition. Any square submatrix of 
such a generalized Vandermonde matrix is also a generalized Vandermonde matrix. A generalized 
Vandermonde matrix is known to have determinant of the form s\(x) Y\.i<j{ x i — x j) where s\(x) is 
a polynomial of degree |A| = Y2i an d called a Schur function [50]. In infinitely many variables 
(not our situation) the Schur function is irreducible [28], but in finitely many variables, the Schur 
function is sometimes irreducible and sometimes not (but there are irreducible Schur functions of 
arbitrarily high degree) [72, Exercise 7.30]. 

We can thus derive the following Theorem and Corollary. 

Theorem 3.46 By Theorem 3.41, no enumerable set of black-box operations of bounded degree can 
compute all Schur functions accurately when the Xi are complex. 

Corollary 3.47 No enumerable set of black-box operations of bounded degree or of bounded number 
of arguments exists that will accurately evaluate all minors of complex generalized Vandermonde 
matrices in the generic case. 

If we restrict the domain T> to be nonnegative real numbers, then the situation changes: The 
non-negativity of the coefficients of the Schur functions shows that they are positive in P, and 
indeed the generalized Vandermonde matrix is totally positive [45]. 

Combined with the homogeneity of the Schur function, Theorem 3.6 implies that the Schur 
function, and so determinants (and minors) of totally positive generalized Vandermonde matrices 
can be evaluated accurately in classical arithmetic (and the algorithms mentioned in Section 2 are 
more efficient than the algorithm used in proving Theorem 3.6. 

Now consider a polynomial Vandermonde matrix Vp defined by a family {Pfc(x)}fc g N of poly- 
nomials such that deg(-Pfc) = k, and Vp(i,j) = Pj-i(xi). Note that these are included in the class 
of generalized Vandermonde matrices, and that the difference lies in the fact that for polynomial 
Vandermonde, the sequence of degrees is increasing and without gaps. 

Note that any Vp can be written as Vp = VC, with V being a regular Vandermonde matrix, 
and C being an upper triangular matrix of coefficients of the polynomials P^, i.e., 

3 

Pj-^x) = Y,C(i,j)x 1 - 1 , VI < j < n . 

i=l 

Denote by c%—\ := D(i,i), for all 1 < i < n the highest-order coefficients of the polynomials 
P (x), . . .,P n -i(x). 

The following two results are proved informally in [13, Section 5]. 
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Theorem 3.48 The set of principal minors of polynomial Vandermonde matrices includes poly- 
nomials which have irreducible factors of arbitrarily large degree. 

Corollary 3.49 By Theorem 3.4-1, the set of polynomial Vandermonde matrices contains matrices 
whose inverses cannot be evaluated accurately even with the addition of any enumerable set of 
bounded- degree black boxes. 

We can also say something about the LDU factorizations of polynomial Vandermonde matrices. 
With the matrix C being the upper triangular matrix of coefficients of the polynomials Pk, we can 
write C = DC, with D being the diagonal matrix of highest-order coefficients, i.e., D(i, i) = C(i, i) 
for all 1 < i < n. We will assume that the matrices C and D are given to us exactly. 

If we let Vp = LpDpUp and V = LDU, it follows that 

L P = L; 
Dp = DD ; 

Up = b~ x uc . 

Since we cannot compute L accurately in the general Vandermonde case, it follows that we 
cannot compute Lp accurately in the polynomial Vandermonde case. Likewise, neither the SVD 
nor the symmetric eigenvalue decomposition (EVD) are computable accurately, but if the polyno- 
mials are certain orthogonal polynomials, then the accurate SVD is possible [22], and an accurate 
symmetric EVD may also be possible [24]. 

3.5.3 Positive results: using extended arithmetic 

Table 1 gathers together structured matrix classes for which it has been established whether and 
which accurate linear algebra algorithms exist. For some matrix classes, it was deduced that 
accurate class-algorithms do not exist, from the fact necessary condition (having an accurately 
evaluable determinant) was violated. 

In this section, we explain how we can use the sufficiency condition for complex matrices devel- 
oped in Section 3.4.2. 

Consider complex polynomial Cauchy matrices, defined (in their simplest form) as follows. Let 
p and q be complex polynomials of one variable. Let now, using MATLAB notation, 

Xi : = p(xi) , VI < i < m 
Vj ■ = q(Vj) , VI < j < m . 

Definition 3.50 We call the matrix C = (Cij) with Cij = x .^_y. where x^ and yj are as above a 
polynomial Cauchy matrix. 

Definition 3.51 Let 

Q~{xi,Vj) = P(xi) - q(y~j) , 

Q + {xi,yj) = p{xi) + q(yj), 

be complex polynomials over C 2 . 
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Recall that the determinant of the Cauchy matrix C is 



detC 



Ui,j( x i - x j)(yi - Vj) 



WiAxi + vj) 



(14) 



Although our models of arithmetic do not incorporate division, computers do perform division 
by a non-zero number as an accurate operation. Therefore, given accurate division and black-box 
algorithms for computing the polynomials Q~ and Q + , one immediately has a simple and accurate 
algorithm to evaluate any minor for the matrix C, therefore any linear algebra operations can be 
easily performed on C (this algorithm is guaranteed by Theorem 3.38). 

In fact, we can obtain a much more general result. 



Theorem 3.52 Let $ be a formula satisfying NIC and depending on variables x±, 



Let 



p be a polynomial (resp., let {pi}™ be a set of polynomials), and let X{ = p(A(i, 1 : m)) (resp., 
Pi(A(i, 1 : m))) for some matrix of parameters A. 

We can accurately evaluate <3? on the new set of inputs depending on the parameters of A, 
provided that we build three (resp., m 2 + 2m) black boxes, computing 



V 

Q + {yi,--- ,Vn,zi, 

Q~(yi,--- ,y n ,zi, 

respectively, for all 1 < i < j < m. 




,Z n ) =p(yi, 
,z n ) =p{yi, 



,y n )+p(zi,---,p n ) 
,y n ) - p(zi, . . . ,p n ) 



,y n ,zi,...,z n ) = pi(yi,. . . ,y n ) + pj {z\ , . . . , p n ) 
,y n ,zi,...,z n ) =Pi{yi,-..,y n ) -Pj{zi,...,p n ) 



Another class of matrices which admit accurate linear algebra algorithms in extended arithmetic 
are the Green's matrices, which arise from discrete representations of Sturm-Liouville equations. 
These matrices are inverses of irreducible tridiagonal matrices. 

Generic Green's matrices have a simple four- vector representation (see, for example, [39], [59]), 

as 

if i > j 



F; 



'•J 



aibj, 

C-i dj , 



if i < j 
d = (di, 



for a= (ai,...a n ), b= (b% , . ._. , b n ) , c = (ci,...,c n ), d= (di,...,d n ), and 1 < i,j < n. 

The case when a = c and b = d, i.e., the symmetric case, has been particularly well-studied (see 
[29], [45]), and we describe it it a bit more detail. 

h %2 ■■■ i P 
h h ■■■ j P 
x y 



We use the notation X 



for the minor of matrix X corresponding to rows 



n, 



and columns j\, . . . ,j p , and 



z t 



for the determinant (xt — yz). 



All minors of symmetric Green's matrices have a simple representation (following [45]) as 
G 



n 12 



V 



CSfel 



a k 2 


ah 






ai 2 




o-k p 




h 2 


K 




h 3 


K 




bh 

nip 
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where k m = min(i m ,j m ) and l m = max(z m , j m ). 

Similarly, all minors of generic Green's matrices can be shown (through a simple inductive 
argument) to be either or products of linear and quadratic factors. Here, by "linear factor" we 
mean a factor of the type Oj, bj, Ck, or di, and by "quadratic factor" we mean a factor of the type 
xt — yz, with x, y, t, z being entries of a, b, c, d. 

We can then conclude that, given a black box computing p(x, y, z, t) : =xt — yz accurately, by 
Theorem 3.38 one can compute all minors of generic Green's matrices. Therefore, as was observed 
in [18], one can evaluate all the minors of generic Green's matrices, and consequently perform linear 
algebra accurately. 

Green's matrices belong to the class of Hierarchically semi- separable or HSS matrices. There 
are many definitions of the latter, one of them being that HSS matrices of order k £ N are matrices 
for which any off-diagonal submatrix has rank no bigger than k. Other examples are tridiagonal 
matrices, banded matrices, inverses of banded matrices, etc. The HSS matrices are extremely useful 
as preconditioners, and arise in many applications. Since determinants of tridiagonal matrices with 
independent indeterminates as entries are irreducible, and tridiagonals are special cases of HSS 
matrices, some (and perhaps all) HSS matrices do have irreducible determinants. 

Still, we believe that further investigation of the large class of HSS matrices may yield other 
examples of subclasses for which simple black-box operations could be constructed in order to 
accurately compute minors, and therefore, be able to perform linear algebra accurately. 

4 Other Models of Arithmetic 

Though the arithmetic models in this paper use real (or complex) numbers and rounding errors, 
our goal is to draw conclusions about practical finite precision computation, i.e., with numbers 
represented as finite bit strings (e.g., floating point numbers). In such a bit model, all rational 
functions of the arguments can be computed accurately, even exactly, because the arguments are 
rational; the only question is cost. In this section we draw conclusions about cost from our analysis. 

We would like to quantify our intuition that, for example, it is much cheaper to accurately 
compute the determinant of an n-by-n Vandermonde matrix with the familiar formula than with 
Gaussian elimination with sufficiently high precision arithmetic. We do not mean the difference 
between 0(n 2 ) and 0(n 3 ) arithmetic operations, but the difference in cost between low precision 
and high precision arithmetic. To quantify this cost, we need to pick a number representation. 

We will assume that "failure" is not allowed, i.e., neither overflow nor underflow is permitted, 
so that intermediate (and final) results can grow or shrink in magnitude as needed to complete the 
computation. 

We claim that the natural representation to use is the pair of integers (e, m) to represent m-2 e , 
i.e., binary floating point. Pros and cons of various number models are discussed in [13], but we 
restrict ourselves here to explaining why we choose floating as opposed to fixed point, which is also 
widely used for analysis (in fixed point, m • 2 e would be represented using up to e explicit zeros 
before or after the bits representing m). 

One can of course represent the same set of (binary) rational numbers in both fixed and floating 
point, but floating point is much more compressed: It takes about log 2 |e| +log 2 \m\ bits to represent 
(e, m), but about |e| + log 2 \m\ bits to represent m ■ 2 e in fixed point, which is exponentially larger. 

First, as a result of this possibly exponentially greater use of space by fixed point, it is possible for 
a sequence of n fixed-point arithmetic operations to take time exponential in n (repeated squaring 
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doubles the length of result at each step, even if only a fixed number of most significant bits are 
kept). In contrast, n floating point arithmetic operations with fixed relative error take time that 
grows at worst like 0(n 2 ) (attained by repeated squaring again, which adds one bit to e at each 
squaring). In particular, any of the expressions in earlier sections of this paper can be evaluated in 
polynomial time in the size of the expression, and the size of their floating point arguments. 

Second, this exponentially greater use of space in fixed-point means that algorithms can appear 
"artificially" cheaper, because they are only polynomial in the input size |e| + log 2 \m\, whereas 
they would not be polynomial as a function of the input size measured as log 2 |e| + log 2 \m\. (This 
is analogous to asking whether an algorithm with integer inputs runs in polynomial time or not, 
depending on whether the inputs are represented in unary or binary.) For example, it is possible to 
accurately compute the determinant of a general matrix with fixed point entries in polynomial time 
in the size of the input [9], but we know of no such polynomial time algorithm with floating point 
entries. Running a conventional determinant algorithm (e.g., Gaussian elimination with pivoting) 
in high enough precision would require roughly log 2 k(A) = log 2 (||A|| • ||A _1 ||) bits of precision, 
which can grow like |e| rather than log 2 |e| (e.g., consider 

A = \ V ~ X V 

I y y+x_ 

for j/>i, where det(^4) = —x 2 ). 

Indeed, the obvious "witness" to identify a singular matrix, a null-vector, can have exponentially 
more nonzero bits than the matrix, as the following example shows. Consider the (2n+l)-by- 
(2n+l) tridiagonal matrix T with Is on the subdiagonal, —Is on the superdiagonal, and diag(T) = 
[x±, x 2 , x n -i,x n , 0, — x n , — x n _i, — X2, —x±\. It is easy to confirm that T is singular, with right 
null vector v = j> 2 , ...,P2n] where pi = det(T(l : i, 1 : i)) is a leading principal minor. If we 

let X{ = 2 ei with e\ = 0, e 2 = 1, and > e%~\ + ej_ 2 , then one can confirm for i < n that pi 
is an integer with fi nonzero bits, where f\ = 1, / 2 = 2, and fi = fi-\ + /j_ 2 is the Fibonacci 
sequence. Since fi grows exponentially, the null vector v has exponentially many bits as a function 
of n, whereas the size of T is at most 0(n log e„), which can be as small as 0(n 2 ). 

Another way to see the difference between fixed and floating point is to consider the simple 
expression niLiU + ^i)- If the Xi are supplied in fixed point, the entire expression can be computed 
exactly in polynomial time. However in floating point, though the leading bits and trailing bits are 
easy, computing some of the bits is as hard as computing the permanent, a problem widely believed 
to have exponential complexity in n [76]. 

Here is the reduction to the permanent. 3 Let A be an n-by-n matrix whose entries are 0s and 
Is. The permanent is the same as the determinant, except that all terms in the Laplace expansion 
are added, instead of some being added and some subtracted. Let and Cj be independent 
indeterminates, and consider the multivariate polynomial 

p(r 1 ,...,r n ,cx,...,c n )= [ (1 + rjCj). (15) 

Then the coefficient k of \\7=\ r i c % i n the expansion of p can be seen to be the permanent. Next 
we replace rj and Cj by widely enough spaced powers of 2, so that every coefficient of every term 

3 We acknowledge Benjamin Diament for having discovered the result relating floating point complexity to the 
permanent. 
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in the expansion of p appears in non-overlapping bits of p evaluated at these powers of 2. Since 
no coefficient can exceed 2 n , and since the sequence of exponents (/ n , fx, e n , e\) in any 
term YYi = i c{ 1 of p can be thought of as the unique expansion of a number in base n + 1, 
one can see that choosing rj = 2 n2 ( n+1 ) 1 1 and cj = 2 n2 ^ n+1 ^" +j 1 suffices. The biggest possible 
product riCj is r n c n = 2" 2 ^ n+1 )" ^O+i) 2 ' 1 x ) < 2 2n2 ( n + 1 ) 2n ) where the exponent takes at most 
log 2 (2n 2 (n + l) 2n ) = 0{n log n) bits to represent, so all the arguments rjCj in the product in (15) 
take 0(n 3 log re) bits to represent. 

Now we consider "black box arithmetic", whose purpose is to model the use of subroutine 
libraries with selected high accuracy operations. We claim that any multivariate polynomial ("black 
box") with t terms of maximum degree d, can be evaluated accurately in polynomial time as a 
function of d, t and the size of the input floating point numbers. The algorithm is simply to 
evaluate each term exactly, and then sum them in decreasing order of exponents, using a register 
of about log 2 i bits more than needed to store the longer term exactly [19, 16]. In particular, any 
enumerable collection of black-boxes that are all bounded in degree d and number of terms t can 
all be thought of as running in time polynomial in the size of their floating point arguments, just 
like the basic operations of addition, subtraction and multiplication. If the number of terms t is 
proportional to the number of inputs (e.g., dot products of vectors of length i), then the cost is 
still polynomial in the input size. 

In summary, in a natural floating point model of arithmetic, the algorithms we have discussed 
run in polynomial time in the size of the inputs, whereas simply running a conventional algorithm 
in sufficiently high precision arithmetic to get the answer accurately can take exponentially longer. 
We know of no guaranteed polynomial-time alternatives to our algorithms. 



5 Structured Condition Numbers 

In this section we begin by recalling some attractive properties of structured condition numbers 
for problems that we can solve accurately, and discuss possible generalizations. If our problem is 
evaluating the function p(x±, ...,x n ), then the structured condition number K strU ct is simply the 
derivative of the relative change in p with respect to relative changes in its arguments: 

\\(xi^,...,x n ^)\\ 

^struct — i i 

\P\ 

where any vector norm may be used in the numerator. 

The simplest before, is for problems described by Theorem 5.12 and Corollary 5.15, 

which say that in the complex case, a necessary and sufficient condition for accurate evaluation of 
complex p(x) using only traditional arithmetic (± and x) is that V{p) be allowable, in which case 
p(x) factors completely into factors of the forms xf , and (xj±Xj)^, where a and (3 are fixed integers. 

This covers many of the linear algebra examples in Section 2. Given such a simple expression it is 

x d v 

easy to evaluate the structured condition number: Each factor xf adds a to — and each factor 

(x i ±x j ) 13 adds \(3xi/(xi±Xj)\ < |/3|/rel_gap(a; i , T%j)- 

Slightly more generally, for expressions satisfying NIC, e.g., including real expressions that only 
add like-signed values, analogous conclusions can be drawn. This is because factors that only add 
like-signed values can only make bounded contributions to the condition number. 
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Given a structured condition number for a decomposition like LDU with complete pivoting (an 
RRD), this essentially becomes a structured condition number for the SVD [15, Thm 2.1]. 

Now we consider the set of ill-posed problems, i.e., the ones whose structured condition numbers 
are infinite. Examining (16), we see that p = is a necessary condition, i.e., the ill-posed problems 
are a subset of V(p). (If p{x) were rational, we would include the poles as well.) For every term 
|/3|/rel_gap(xj, m the structured condition number, the corresponding ill-posed set is defined 
by Xi = ^fXj. All of V(p) is not necessarily ill-posed, since for example small relative changes in x 
only cause small relative changes in p(x) = x a . 

It is natural to ask if there is a relationship between the distance to the nearest ill-posed problem, 
i.e., the smallest relative change to the Xi that make the problem ill-posed, and its structured 
condition number [10]. It is easy to see that for any term |/3|/reLgap(xj, ^Xj) in the structured 
condition number, the smallest relative changes to Xi and =fXj that make it infinite are close to 
rel_gap(xj, T^j) when it is small. In other words, the structured condition number is close to the 
reciprocal of the distance to the nearest ill-posed problem, measured by the smallest relative change 
to the arguments x^. This helps explain geometrically why the structured condition number can 
be so much smaller than the unstructured one: it takes, for example, a much larger perturbation 
to make x% = i — \ and Xj = j ' — \ equal than the smallest singular value of the Hilbert matrix 
Hij = I/fa + xj). 

This reciprocal-condition-number property, that the reciprocal of the condition number is 
approximately the distance to the nearest ill-posed problem, is common in numerical analysis 
[10, 67, 69]. The following simple asymptotic argument shows why: 

If the structured condition number (16) is very large, then some component \xi-^-/p\ S> 1, i.e., 

\p/~&x 7 \ ^ \ Xi \i 01 m °ther words one step of Newton's method xf ew = X{ — p/J^r to find a root of 
p = will take a very small step. Therefore it is plausible that this step pi Sr- is very close to the 

smallest (absolute) distance to the variety in the direction (or an integer multiple of p/J^r is, 
the multiplicity of the root) and dividing by yields the relative distance. 

Now let us go beyond expressions evaluable accurately just using NIC. Consider the case of a real 
positive polynomial or empty variety, as discussed in Section 3.2. The analysis in Theorem 3.5 (resp., 
Theorem 3.6) shows that the relative condition number will grow like 1/pmin (resp., l/Pmin,homo), 
the reciprocal of the smallest value p(x) can take on the appropriate domain. So the relative 
condition number can be arbitrarily large, but in the absence of a variety intersecting the domain 
it remains bounded. 

Based on these examples and analysis, we conjecture that for traditional arithmetic, the follow- 
ing two statements hold. (1) The reciprocal of the structured condition number is an approximation 
of the relative distance from x to the nearest ill-posed problem, perhaps asymptotically. (2) This 
relative distance is approximately given by rel_gap(xj, for some i and j. 

This reciprocal-condition-number property is quite robust as the arguments above suggest, and 
does not necessarily depend on accurate evaluability. For example, if p{x) = (x\ + x<i + x^) a then 
its structured condition number is a||sc||/|xi -\-xi-\-x?\, and \x\ + X2 + X3|/||x||i is indeed the relative 
distance. However, the reciprocal-condition-number property is not universal but depends on the 
structure we impose [66, 68, 70]. Just as this reciprocal condition number property is equivalent 
to the statement that computing the condition number is as sensitive a problem as solving the 
original problem, we conjecture that the structured condition number ^struct c & n only be computed 
accurately if the original problem p can be, at least in the interesting case when K s t ruc t is large. 
This seems reasonable since p{x) ends up in the denominator of Kstruct, so we need to evaluate p 
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accurately near its zeros (or poles). But the numerators dp/dxi could be anything, and perhaps 
even have zeros on unallowable varieties, so to be more precise we conjecture that p can be evaluated 
accurately in some open neighborhood of its zeros (or poles) if and only if K s t rU ct can be. 

6 Conclusions 

In this paper, we have made the case for accurate evaluation of polynomial expressions and accurate 
linear algebra; we have shown that such evaluation is desirable (Section 1), significant (Section 4) 
and often realizable efficiently (Section 2). We have listed, in Section 2, many types of structured 
matrices that have been analyzed from an accuracy perspective in the numerical linear algebra lit- 
erature, while in Section 3 we identified the common algebraic structure that made them analyzable 
in the first place. 

There are limits to how much we can hope to extend the class of structured matrices for which 
linear algebra can be performed accurately; the "negative examples" of Section 3.5 show that, for 
some classes of matrices, accuracy cannot be achieved in finite precision, and both Sections 2 and 
3 mention problems that are impossible to solve in "traditional" arithmetic. The former should 
be seen as "hard" barriers, but the latter should be seen as a challenge, both from theoretical and 
computational perspectives. The theory should aim to provide answers to the question of how 
to extend one's arithmetic by adding "black-box" operations, in order to make these structured 
problems solvable (as we do for the examples of Section 2.3); the computation should design software 
implementing such "black boxes". 

In summary, accurate evaluation is an important area of scientific computing, which has been 
advanced by the recent results presented here. Plenty of work remains in adding to both the 
theoretical framework (which apparently requires familiarity with "pure" mathematical fields such 
as algebraic geometry, topology, and analysis) and to the practical one (software implementation). 
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